Overview
I'm fairly new when it comes to multinomial models, but my understanding is that the model coefficients are generally (always?) interpreted in relation to a base or reference case, most software typically choosing either the first or last factor/class. So if there are three classes (k) available to be predicted, the model will return k-1 sets of coefficients.
If this is true, then I have three things I would like to understand/accomplish from this post:
- Why does
glmnet
present coefficients for each of the classes... - and how should they be interpreted?
- How might one convert/re-parameterize the coefficients returned in
glmnet
to put them in terms of a base case? - BONUS: What is the correct way of exponentiating the coefficients such that one could generate predicted probabilities "by hand."
I've read every post I could find on this subject like this, but I was hoping for some validation or explanation of what the author wrote.
Example
Model building
I'll build two multinomial models, one with glmnet::glmnet(family = "multinomial")
, and one with nnet::multinom()
, predicting Species
by Sepal.Length
and Sepal.Width
from everyone's favorite dataset. Fitting with nnet
presents the coefficients as I would have expected - in terms of relation to base case (setosa).
# needed for glmnet x/y notation
iris.x <- as.matrix(iris[1:2])
iris.y <- as.matrix(iris[5])
# fitting via glmnet
mod.glmnet <- glmnet::glmnet(
x = iris.x, y = iris.y,
family = "multinomial",
lambda = 0,
type.multinomial = "grouped"
)
# fitting via nnet
mod.nnet <- nnet::multinom(
Species ~ Sepal.Length + Sepal.Width,
data = iris
)
Inspect Coefficients
This is where I'm lost. Now, the person who posted the solution linked in the Stack Exchange post above suggested that one obtains the coefficients in terms of base case by subtracting the base case coefficients from the rest.
cf.glmnet <- coef(mod.glmnet)
cf.nnet <- coef(mod.nnet)
# cleaning up glmnet coefs
library(dplyr, warn.conflicts = FALSE)
cf.glmnet2 <-
cf.glmnet %>%
lapply(as.matrix) %>%
Reduce(cbind, x = .) %>%
t()
rownames(cf.glmnet2) <- names(cf.glmnet)
colnames(cf.glmnet2) <- colnames(cf.nnet)
cf.glmnet2
#> (Intercept) Sepal.Length Sepal.Width
#> setosa 40.53453 -15.236896 13.391018
#> versicolor -13.75194 6.668503 -6.897809
#> virginica -26.78259 8.568392 -6.493209
cf.nnet
#> (Intercept) Sepal.Length Sepal.Width
#> versicolor -92.09924 40.40326 -40.58755
#> virginica -105.10096 42.30094 -40.18799
# re-parameterizing via subtraction?
cf.glmnet2 %>%
tail(2) %>%
apply(1, function(x) x - cf.glmnet2[1, ]) %>%
t()
#> (Intercept) Sepal.Length Sepal.Width
#> versicolor -54.28647 21.90540 -20.28883
#> virginica -67.31712 23.80529 -19.88423
As you can see, the coefficients between the two fitting procedures are quite different. I'm assuming there's a fundamental gap of knowledge on my part.
One final note is that I tagged the parsnip
package in this post because I would like to be able to use glmnets regularization with this multinomial model in the tidymodels
framework with multinomial_reg()
, but I didn't want to proceed until I understood what the model was producing.
If anyone could help on this I will greatly appreciate it!
Thanks!