# Multinomial Logistic Regression

Hello all,

I am trying to model how people choose quick-service restaurants based on such restaurant's 7Ps marketing effort. I have identified one variable for each through review and focus-group discussion. I then prepared the choice set using fractional factorial design that yielded 8 choice sets for combinations of 7Ps set at 2 levels. I then collected the data (n=205), ran a factor analysis to check for multi-collinearity and reduce the number of factors. The analysis brought the no. of factors into 3. I then ran a multinom function in R. It yielded the following outcome:

Coefficients:
(Intercept)     Comp1        Comp2            Comp3
2    -63.65250     25.75926    -147.17591    -59.974763
3    -26.00799    -25.71691    -95.43358       72.425133
4    -27.05487     90.66487     106.98295     14.168667
5    -37.62440    -5.53799      -16.16555       119.825670
6    -45.05999    -95.05371    -112.24665     -14.973211
7     17.44952     41.80775     -19.93649       -2.856781
8    -35.52004     97.67893      133.37735      29.759599

Std. Errors:
(Intercept)       Comp1            Comp2              Comp3
2   34820.034      41017.782      102127.728       39505.746
3   10012.253      7980.372         92570.830        100757.394
4    7955.508       20192.606       2668.002          9496.852
5    9095.771       30302.422       95093.216        83416.330
6   16035.556      8140.474         53513.802        15633.772
7   47130.333      56133.053      111280.164        92469.898
8   21264.009      17137.406       28170.730         72987.495

Residual Deviance: 0.0001186664
AIC: 56.00012


I am unable to locate p value and if the model is fit and my analysis accurate.
Kindly comment.

Best,
Bidhi

Hi, and welcome!

You'll attract more and better answers with a reproducible example, called a reprex. The code without either actual or representative data makes specific comments difficult.

I can, however, offer some general guidance.

Logistic regression bears some underlying similarities to linear regression, but the differences are considerable. If you will be doing much in this area, an essential resources is Applied Logistic Regression 3rd Edition by David W. Hosmer Jr., Stanley Lemeshow and Rodney X. Sturdivant (2009) , the standard text. Unfortunately, it is software agnostic and doesn't include examples in R or any other language. I'm working on an R companion at my blog, but it will be a few weeks until Chapter 1 is complete.

Consider a data frame of binary values and a fully saturated glm model

suppressPackageStartupMessages(library(dplyr))
#> Warning: Missing column names filled in: 'X1' [1]
#> Warning: Duplicated column names deduplicated: 'X1' => 'X1_1' [3]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   Y = col_double(),
#>   X1_1 = col_double(),
#>   X2 = col_double(),
#>   X3 = col_double(),
#>   X4 = col_double(),
#>   X5 = col_double(),
#>   X6 = col_double(),
#>   X7 = col_double(),
#>   X8 = col_double(),
#>   X9 = col_double(),
#>   X10 = col_double(),
#>   X11 = col_double(),
#>   X12 = col_double(),
#>   X13 = col_double(),
#>   X14 = col_double(),
#>   X15 = col_double(),
#>   X16 = col_double()
#> )
# remove row names and rename first independent variable
binary_data <- binary_data %>% select(-X1) %>% rename(X1 = X1_1)
#> # A tibble: 6 x 17
#>       Y    X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     0     0     0     0     0     0     1     0     0     0     0
#> 2     1     0     0     0     0     0     0     0     1     0     0     0     0
#> 3     1     0     0     0     0     0     0     0     0     0     0     0     0
#> 4     1     0     0     0     0     0     0     0     0     0     0     0     0
#> 5     1     0     0     0     0     0     1     0     0     0     0     0     0
#> 6     1     0     0     1     0     0     0     0     0     0     0     0     0
#> # … with 4 more variables: X13 <dbl>, X14 <dbl>, X15 <dbl>, X16 <dbl>
# create fully saturated model
fit <- glm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 + X15 + X16, data = binary_data, family = binomial(link = "logit"))
summary(fit)
#>
#> Call:
#> glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 +
#>     X10 + X11 + X12 + X13 + X14 + X15 + X16, family = binomial(link = "logit"),
#>     data = binary_data)
#>
#> Deviance Residuals:
#>     Min       1Q   Median       3Q      Max
#> -2.8522   0.2025   0.2916   0.3122   1.9870
#>
#> Coefficients: (3 not defined because of singularities)
#>             Estimate Std. Error z value Pr(>|z|)
#> (Intercept)   1.5533     0.2347   6.619 3.61e-11 ***
#> X1                NA         NA      NA       NA
#> X2           -2.8596     0.2961  -9.659  < 2e-16 ***
#> X3            1.4435     0.2471   5.843 5.13e-09 ***
#> X4           -2.1107     0.2599  -8.121 4.63e-16 ***
#> X5           12.0127   239.4433   0.050   0.9600
#> X6            0.7076     0.3098   2.284   0.0224 *
#> X7            0.3745     0.4453   0.841   0.4002
#> X8            1.5831     0.2570   6.160 7.29e-10 ***
#> X9            2.3236     0.3343   6.950 3.65e-12 ***
#> X10               NA         NA      NA       NA
#> X11          -2.3006     0.2963  -7.763 8.28e-15 ***
#> X12          -3.3779     0.5360  -6.302 2.94e-10 ***
#> X13           0.3780     0.2677   1.412   0.1580
#> X14          -2.4696     0.3938  -6.271 3.58e-10 ***
#> X15           2.4970     0.3740   6.677 2.44e-11 ***
#> X16               NA         NA      NA       NA
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#>     Null deviance: 6153.2  on 9348  degrees of freedom
#> Residual deviance: 4167.3  on 9335  degrees of freedom
#> AIC: 4195.3
#>
#> Number of Fisher Scoring iterations: 12


The first thing to note is the change between Null deviance and Residual deviance at the bottom of the summary. This plays a similar role to the F Statistic in lm. The decrease shows that the model explains more of the variation.

The p-values assess the usefulness or not of the coefficients. To begin, some of the coefficients are NA. It turns out that all of the values are 0; and it should be obvious that they should be eliminated from the model because they tell us literally nothing.

Some of the remaining coefficients fail the significance test at the traditional \alpha = 0.05 level. We'll eliminate those, along with the NA coefficients and rerun the model:

# remove NA and p-values < $\alpha$
fit2 <- glm(Y ~ X2 + X3 + X4 + X6 + X8 + X9 + X11 + X12 + X14 + X15, data = binary_data, family = binomial(link = "logit"))
summary(fit2)
#>
#> Call:
#> glm(formula = Y ~ X2 + X3 + X4 + X6 + X8 + X9 + X11 + X12 + X14 +
#>     X15, family = binomial(link = "logit"), data = binary_data)
#>
#> Deviance Residuals:
#>     Min       1Q   Median       3Q      Max
#> -2.8522   0.2025   0.2916   0.3122   1.9870
#>
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)
#> (Intercept)   1.8663     0.1080  17.281  < 2e-16 ***
#> X2           -3.1726     0.2104 -15.082  < 2e-16 ***
#> X3            1.1305     0.1328   8.515  < 2e-16 ***
#> X4           -2.4237     0.1554 -15.598  < 2e-16 ***
#> X6            0.3946     0.2293   1.721   0.0852 .
#> X8            1.2701     0.1505   8.440  < 2e-16 ***
#> X9            2.0106     0.2615   7.690 1.47e-14 ***
#> X11          -2.6136     0.2107 -12.402  < 2e-16 ***
#> X12          -3.6909     0.4939  -7.473 7.83e-14 ***
#> X14          -2.7826     0.3342  -8.327  < 2e-16 ***
#> X15           2.1840     0.3105   7.033 2.01e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#>     Null deviance: 6153.2  on 9348  degrees of freedom
#> Residual deviance: 4170.7  on 9338  degrees of freedom
#> AIC: 4192.7
#>
#> Number of Fisher Scoring iterations: 6


Created on 2019-12-17 by the reprex package (v0.3.0)

We would eliminate X6 in like manner, leading, perhaps to further rounds of elimination.

But let's skip over those to find the missing measure of probability. Strangely, it's not in the glm object, but given by

logLik(fit)
'log Lik.' -2084 (df=14)


What's this? The log likelihood. How to interpret it?

Usually, this is done through conversion to an odds ratio.

odr <- function(x) {
exp(cbind(OR = coef(x), confint(x)))
}
odr(fit2)
#> Waiting for profiling to be done...
#>                     OR       2.5 %      97.5 %
#> (Intercept) 6.46464646 5.258904976  8.03440121
#> X2          0.04189453 0.027435667  0.06266866
#> X3          3.09726563 2.381064505  4.00914584
#> X4          0.08859375 0.065075959  0.11970440
#> X6          1.48385417 0.959876953  2.36568686
#> X8          3.56106908 2.651198481  4.78618345
#> X9          7.46796870 4.587235443 12.86209781
#> X11         0.07327303 0.048110916  0.11004687
#> X12         0.02494960 0.008361653  0.06034486
#> X14         0.06187500 0.031190057  0.11665747
#> X15         8.88164006 5.031918109 17.18366500


Created on 2019-12-17 by the reprex package (v0.3.0)

An odds ratio equal to 1 translates Y s equally likely with or without X; OR > 1, say 2, means more, twice, as likely. OR = 0.5 = less, half as likely.

For goodness of fit, there's the Hosmer-Lemeshow goodness of fit test , which has a null hypothesis H_0, that the fit is poor; accordingly a high p-value is evidence of a good fit. The results are based on dividing the probabilities for the response variable, Y into deciles and then to examine the expected and actual results against their estimates, as shown in the following two tables. Note that the interpretation of p-value is backwards from what you may be used to.

There's much more, but I hope this will get you oriented.

1 Like

Thank you! I will try that. Since my study is on discrete choice modeling, I was also suggest to use mlogit function in R instead of multinom. I will try both.

1 Like

glm includes families for both multiple responses and multiple ordered responses. The underlying link methods differ, but the tools are similar.

Good luck!

@technocrat stats::glm() does not support multinomial regression. See ?family for details.

@Bidhi in the first regression fit, the standard errors are much much larger than the regression coefficients themselves for all the coefficients. Either there's some issue fitting the model, you don't have enough data, of the factors aren't related to the outcome in a meaningful sense.

# Credit: https://stackoverflow.com/questions/42114194/can-multinomial-models-be-estimated-using-generalized-linear-model#42114195 @AdamO
data(VA, package = "MASS")
VA.tab <- table(VA[, c('cell', 'treat')])
summary(glm(Freq ~ cell * treat, data=VA.tab, family=poisson))
#>
#> Call:
#> glm(formula = Freq ~ cell * treat, family = poisson, data = VA.tab)
#>
#> Deviance Residuals:
#> [1]  0  0  0  0  0  0  0  0
#>
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)
#> (Intercept)   2.708e+00  2.582e-01  10.488   <2e-16 ***
#> cell2         6.931e-01  3.162e-01   2.192   0.0284 *
#> cell3        -5.108e-01  4.216e-01  -1.212   0.2257
#> cell4        -1.571e-15  3.651e-01   0.000   1.0000
#> treat2        2.877e-01  3.416e-01   0.842   0.3996
#> cell2:treat2 -7.985e-01  4.534e-01  -1.761   0.0782 .
#> cell3:treat2  4.055e-01  5.323e-01   0.762   0.4462
#> cell4:treat2 -5.108e-01  5.164e-01  -0.989   0.3226
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#>     Null deviance: 1.5371e+01  on 7  degrees of freedom
#> Residual deviance: 4.4409e-15  on 0  degrees of freedom
#> AIC: 53.066
#>
#> Number of Fisher Scoring iterations: 3


Created on 2020-01-02 by the reprex package (v0.3.0)

1 Like