I'm fairly new when it comes to multinomial models, but my understanding is that the model coefficients are generally (always?) interpreted in relation to a base or reference case, most software typically choosing either the first or last factor/class. So if there are three classes (k) available to be predicted, the model will return k-1 sets of coefficients.
If this is true, then I have three things I would like to understand/accomplish from this post:
- Why does
glmnetpresent coefficients for each of the classes...
- and how should they be interpreted?
- How might one convert/re-parameterize the coefficients returned in
glmnetto put them in terms of a base case?
- BONUS: What is the correct way of exponentiating the coefficients such that one could generate predicted probabilities "by hand."
I've read every post I could find on this subject like this, but I was hoping for some validation or explanation of what the author wrote.
I'll build two multinomial models, one with
glmnet::glmnet(family = "multinomial"), and one with
Sepal.Width from everyone's favorite dataset. Fitting with
nnet presents the coefficients as I would have expected - in terms of relation to base case (setosa).
# needed for glmnet x/y notation iris.x <- as.matrix(iris[1:2]) iris.y <- as.matrix(iris) # fitting via glmnet mod.glmnet <- glmnet::glmnet( x = iris.x, y = iris.y, family = "multinomial", lambda = 0, type.multinomial = "grouped" ) # fitting via nnet mod.nnet <- nnet::multinom( Species ~ Sepal.Length + Sepal.Width, data = iris )
This is where I'm lost. Now, the person who posted the solution linked in the Stack Exchange post above suggested that one obtains the coefficients in terms of base case by subtracting the base case coefficients from the rest.
cf.glmnet <- coef(mod.glmnet) cf.nnet <- coef(mod.nnet) # cleaning up glmnet coefs library(dplyr, warn.conflicts = FALSE) cf.glmnet2 <- cf.glmnet %>% lapply(as.matrix) %>% Reduce(cbind, x = .) %>% t() rownames(cf.glmnet2) <- names(cf.glmnet) colnames(cf.glmnet2) <- colnames(cf.nnet) cf.glmnet2 #> (Intercept) Sepal.Length Sepal.Width #> setosa 40.53453 -15.236896 13.391018 #> versicolor -13.75194 6.668503 -6.897809 #> virginica -26.78259 8.568392 -6.493209 cf.nnet #> (Intercept) Sepal.Length Sepal.Width #> versicolor -92.09924 40.40326 -40.58755 #> virginica -105.10096 42.30094 -40.18799 # re-parameterizing via subtraction? cf.glmnet2 %>% tail(2) %>% apply(1, function(x) x - cf.glmnet2[1, ]) %>% t() #> (Intercept) Sepal.Length Sepal.Width #> versicolor -54.28647 21.90540 -20.28883 #> virginica -67.31712 23.80529 -19.88423
As you can see, the coefficients between the two fitting procedures are quite different. I'm assuming there's a fundamental gap of knowledge on my part.
One final note is that I tagged the
parsnip package in this post because I would like to be able to use glmnets regularization with this multinomial model in the
tidymodels framework with
multinomial_reg(), but I didn't want to proceed until I understood what the model was producing.
If anyone could help on this I will greatly appreciate it!