sjPlot package to plot marginal effects from a regression by mlogit

(updated from original post)
I am having a problem plotting marginal effects using sjPlot package, after estimating a choice model using the mlogit package.

Reprex below, including error regarding "subscript out of bounds" . Tips?

NOTE: sjPlot package is apparently compatible with mlogit according to update 2.6.3.

see more here https://cran.r-project.org/web/packages/sjPlot/news/news.html


library(mlogit)
#> Warning: package 'mlogit' was built under R version 3.6.2
#> Loading required package: dfidx
#> Warning: package 'dfidx' was built under R version 3.6.2
#> 
#> Attaching package: 'dfidx'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(sjPlot)
#> Warning: package 'sjPlot' was built under R version 3.6.2
library(prediction)
library(effects)
#> Warning: package 'effects' was built under R version 3.6.2
#> Loading required package: carData
#> Warning: package 'carData' was built under R version 3.6.2
#> lattice theme set by effectsTheme()
#> See ?effectsTheme for details.

# create ex. data set.  1 row per respondent (i show 2 respondents). Each resp answers 3 choice sets, w/ 2 alternatives in each set.  Unlabeled choice experiemnt

cedata.1 <- data.frame( id    =  c(1,1,1,1,1,1,2,2,2,2,2,2),    # respondent ID. 
                        QES    = c(1,1,2,2,3,3,1,1,2,2,3,3),   # Choice set (with 2 alternatives)    
                        Alt    = c(1,2,1,2,1,2,1,2,1,2,1,2),   # Alt 1 or Alt 2 in  choice set 
                        LOC    = c(0,0,1,1,0,1,0,1,1,0,0,1),   # attribute describing alternative. binary categorical variable
                        SIZE   = c(1,1,1,0,0,1,0,0,1,1,0,1),   # attribute describing alternative. binary categorical variable
                        Choice = c(0,1,1,0,1,0,0,1,0,1,0,1),   # if alternative is Chosen (1) or not (0)
                        gender = c(1,1,1,1,1,1,0,0,0,0,0,0)   # male or female (repeats for each indivdual) 
)

# create data format for mlogit (i.e., indexes for panel dataset)
# NOTE: this generates a "logical" structure for the "choice" (dep variable)
cedata.2 <- mlogit.data(cedata.1, shape="long", choice="Choice", alt.var="Alt", id.var="id")

# convert dep var Choice to factor (from logical), as required by plot_model 
cedata.2$Choice <- as.factor(cedata.2$Choice)

# estimate  model. 
ce.model1 <- mlogit(Choice ~  -1 + LOC + SIZE | gender, data=cedata.2)

# plot Marg effect, based on attribute "SIZE"
plot_model(ce.model1, type = "pred", terms = "SIZE")
#> Error in tmp[["fit"]]: subscript out of bounds

# Need help with  error above: "subscript out of bounds" ?? 

ce.model1 <- mlogit(Choice ~ -1 + LOC + SIZE | gender, data=cedata.2)

plot Marg effect, based on attribute "SIZE"

plot_model(ce.model1)
Warning messages:
1: Transformation introduced infinite values in continuous y-axis
2: Removed 3 rows containing missing values (geom_point).
#> Error in tmp[["fit"]]: subscript out of bounds
ce.model1$SIZE
NULL

removing both pred and SIZE arguments

Hey, thanks. Perhaps I'm thick-headed, but it seems (?) you to solved my error by pointing out another error?

I suspect this has something to do with my reprex dataset that is too small/simple to be able to understand my problem. But i'm afraid to (and unsure if I"m allowed to) include a bigger/more complex reprex dataset.

Is it possible to be more specific about how you think I should proceed?

Thx
scott

1 Like

How does it work when you use bigger data locally?

Fair question. Actually I receive different errors including:

plot_model(ce.model, type = "pred", terms = "SIZE")
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : ***
*** contrasts can be applied only to factors with 2 or more levels

plot_model(ce.model11, type = "int")
Error in tmp[["fit"]] : subscript out of bounds

I should note the following: I'm using a "long" panel dataset created specific for the mlogit package. In the attached screen shot you an see (1) the (last) 3 columns were created as indices through the "mlogit.data" command and (2) there are 12 rows of data for each individual.

Since all the tutorials/vinigretts on these topics are simple models where each row is 1 observation/individual, I think my dataset is causing the problem. But I dont know how...

My suggestions now

  1. nstall R 4.0, if not already there and then mlogit,sjplots,effects
    2, comment out the plot block and see what upstream errors you get, if any. Not necessarily error messages but any output that comes out NA or is not a factor.

Thx for sticking with me on this. I still seem to be getting some errors.

I did following:
. upgraded to R 4.0 (had 3.6 before)
. ensured R upgrade worked in my R studio. Check.
. re installed the 3 packages you named (mlogi, subplot, effects) plus another that was required (prediction)
. Ran my regression model with mlogit with no errors re NAs or otherwise

So everything worked as it did previously so far …

Then I did the following:

  • First, test a command from sjPlot package that integrates my mlogit results to make a simple html table of coefficients. This worked fine:

tab_model(ce.model.sim)

  • Second, test a command from sjPlot package that integrates my mlogit results to plot coefficients. This worked fine:

plot_model(ce.model.sim, vline.color = "red", sort.est = TRUE)

  • Third, test a command from sjPlot package that integrates my mlogit results to plot marginal effects of a regressor. Recieved an ERROR:

plot_model(ce.model.sim, type = "pred", terms = "SIZE")

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :

contrasts can be applied only to factors with 2 or more levels

  • Fourth, test a command from sjPlot package that integrates my mlogit results to plot an interaction effect (based on a diff model with IEs). Recieved an ERROR:

plot_model(ce.model11, type = "int")*

*Error in tmp[["fit"]] : subscript out of bounds

Help?

Yeah. The first error triggers the second. What frosts me is tmp--I can find no trace of it in the plot_model code.

But Error in contrasts is suggestive. See this S/O post

Thx, that S/O post was informative. I did all 4 trouble shooting checks (* see below*) but still get same error (factor levels must have at least 2 levels).

Then i figured it out --> There are in fact “NAs” in my dataset, but NOT related to any relevant variables in my regression model.

Instead, they are in the “idx.alt” column, which has length=2 and then a bunch of NAs. See attached screenshots. The last 3 columns are attached to my dataframe (id1, idx.chid, idx.alt) via mandatory “indexing” required by “mlogit.data” command, which creates a data frame specific for the “mlogit” regression package that i’m using.

As far as I know I cannot run mlogit regression model (or "plot_model" which are based on said regression results) without these indexing columns. Therefore, i cannot simply remove them in order to avoid the NAs.

Anybody familiar with "mlogit.data" that can help? Several posts related to "mlogit.data", but none seem to encounter my specific problem (try “problem with mlogit.data" in google). See also info mlogit.data here: https://cran.r-project.org/web/packages/mlogit/vignettes/c2.formula.data.html

Thx
Scott

  • I did following 4 trouble shooting checks
    . check nr of factor levels (including possible “empty” ones). All are non empty and at least 2 levels.
    . check for NAs. None exist except in indexing variables.
    . check for exotic characters å, ä, ä in dataset. None.
    . finally, I also simplified dataset with only relevant variables and excluded everything else (but indexing variables remain!)

Another example that confirms that NAs are causing this error: r - Problem with building mlogit model (with no alternative specific variables) - Cross Validated

I suggest starting a new thread along the lines of mlogit regression: are index variables required? I think not based on this vignette. It's hard to see how any of the index variables contribute anything to the analysis, especially id.alt which has only two values: 6030 and NA.

Thx i posted here:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.