Understanding glmnet's multinomial logistic regression coefficients, not k-1?


I'm fairly new when it comes to multinomial models, but my understanding is that the model coefficients are generally (always?) interpreted in relation to a base or reference case, most software typically choosing either the first or last factor/class. So if there are three classes (k) available to be predicted, the model will return k-1 sets of coefficients.

If this is true, then I have three things I would like to understand/accomplish from this post:

  1. Why does glmnet present coefficients for each of the classes...
  2. and how should they be interpreted?
  3. How might one convert/re-parameterize the coefficients returned in glmnet to put them in terms of a base case?
  4. BONUS: What is the correct way of exponentiating the coefficients such that one could generate predicted probabilities "by hand."

I've read every post I could find on this subject like this, but I was hoping for some validation or explanation of what the author wrote.


Model building

I'll build two multinomial models, one with glmnet::glmnet(family = "multinomial"), and one with nnet::multinom(), predicting Species by Sepal.Length and Sepal.Width from everyone's favorite dataset. Fitting with nnet presents the coefficients as I would have expected - in terms of relation to base case (setosa).

# needed for glmnet x/y notation
iris.x <- as.matrix(iris[1:2])
iris.y <- as.matrix(iris[5])

# fitting via glmnet
mod.glmnet <- glmnet::glmnet(
    x = iris.x, y = iris.y,
    family = "multinomial",
    lambda = 0,
    type.multinomial = "grouped"

# fitting via nnet
mod.nnet <- nnet::multinom(
    Species ~ Sepal.Length + Sepal.Width,
    data = iris

Inspect Coefficients

This is where I'm lost. Now, the person who posted the solution linked in the Stack Exchange post above suggested that one obtains the coefficients in terms of base case by subtracting the base case coefficients from the rest.

cf.glmnet <- coef(mod.glmnet)
cf.nnet   <- coef(mod.nnet)

# cleaning up glmnet coefs
library(dplyr, warn.conflicts = FALSE)

cf.glmnet2 <-
    cf.glmnet %>%
    lapply(as.matrix) %>%
    Reduce(cbind, x = .) %>%

rownames(cf.glmnet2) <- names(cf.glmnet)
colnames(cf.glmnet2) <- colnames(cf.nnet)

#>            (Intercept) Sepal.Length Sepal.Width
#> setosa        40.53453   -15.236896   13.391018
#> versicolor   -13.75194     6.668503   -6.897809
#> virginica    -26.78259     8.568392   -6.493209
#>            (Intercept) Sepal.Length Sepal.Width
#> versicolor   -92.09924     40.40326   -40.58755
#> virginica   -105.10096     42.30094   -40.18799

# re-parameterizing via subtraction?
cf.glmnet2 %>%
    tail(2) %>%
    apply(1, function(x) x - cf.glmnet2[1, ]) %>%
#>            (Intercept) Sepal.Length Sepal.Width
#> versicolor   -54.28647     21.90540   -20.28883
#> virginica    -67.31712     23.80529   -19.88423

As you can see, the coefficients between the two fitting procedures are quite different. I'm assuming there's a fundamental gap of knowledge on my part.

One final note is that I tagged the parsnip package in this post because I would like to be able to use glmnets regularization with this multinomial model in the tidymodels framework with multinomial_reg(), but I didn't want to proceed until I understood what the model was producing.

If anyone could help on this I will greatly appreciate it!


I notice that the glm coefficients sum to zero for each variable. For the Intercept:

40.534 - 13.752 - 26.783 = -0.001

That is another way to define the baseline of coefficients.

Interesting - hadn't noticed that. Do you know what the correct way of exponentiating those coefficients is in order to go from the raw data to predicted probabilities for each class?

I agree that subtraction should transform the glmnet coefficients into the nnet coefficients and that does not work. However, if you calculate the difference between virginica and versicolor, the results are very similar between the two sets. For example, on the Intercepts

nnet:  -105.10096 - -92.09924 = -13.002
glmnet: -26.78259 - -13.75194 = -13.031

The distance to the setosa values is where the disagreement is. We clearly share the gap in knowledge, as I cannot figure this out.

The relevant formula for glmnet can be found at
Trevor Hastie's website

Scroll down to the section on multinomial models.

The probability for each class is just the sum of the coefficients times
the covariates, exponentiated, and normalized by the sum of that thing
for all classes. What are the probabilities for an average size setosa?

Get the mean length and width, and add a 1 for the intercept

meansetosa <- iris %>% group_by(Species) %>% 
  summarize(msl = mean(Sepal.Length), msw = mean(Sepal.Width)) %>% 
  select(-1) %>% slice(1) %>% c(1., ., recursive = TRUE)

# matrix multiply to get the vector of numerators
numerators <- cf.glmnet2 %*% meansetosa
numerators <- exp(numerators)
p <- numerators / sum(numerators)

As expected, the probability for setosa is very high, and the other two
are very low. Which is exactly what the predict method gives us.

predict(mod.glmnet, newx = matrix(meansetosa[2:3], nrow=1), type = "response")
#> , , s0
#>         setosa   versicolor    virginica
#> [1,] 0.9999992 6.958413e-07 8.245742e-08

For nnet, we do the same thing, but insert a row of 0 for the reference class coefficients.

cf.nnet2 <- matrix(c(0,0,0, t(cf.nnet)), nrow=3, byrow = TRUE)
numerators <- cf.nnet2 %*% meansetosa
numerators <- exp(numerators)
p <- numerators / sum(numerators)
#>              [,1]
#> [1,] 1.000000e+00
#> [2,] 2.609031e-13
#> [3,] 3.093541e-14

predict(mod.nnet, newdata = data.frame(Sepal.Length = 5.006, Sepal.Width = 3.428), type = "prob")
#>       setosa   versicolor    virginica 
#> 1.000000e+00 2.609031e-13 3.093541e-14

So nnet is doing the same thing, but insert 0 for the coefficients in the reference case.
Created on 2020-02-07 by the reprex package (v0.2.1)


Makes sense! Thanks for tracking down that link to Hastie's website.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.