Multinomial Logistic Regression

technocrat · December 17, 2019, 9:15pm

Hi, and welcome!

You'll attract more and better answers with a reproducible example, called a reprex. The code without either actual or representative data makes specific comments difficult.

I can, however, offer some general guidance.

Logistic regression bears some underlying similarities to linear regression, but the differences are considerable. If you will be doing much in this area, an essential resources is Applied Logistic Regression 3rd Edition by David W. Hosmer Jr., Stanley Lemeshow and Rodney X. Sturdivant (2009) , the standard text. Unfortunately, it is software agnostic and doesn't include examples in R or any other language. I'm working on an R companion at my blog, but it will be a few weeks until Chapter 1 is complete.

Consider a data frame of binary values and a fully saturated glm model

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(readr)) 
binary_data <- read_csv("https://tuva.s3-us-west-2.amazonaws.com/binary_data.csv")
#> Warning: Missing column names filled in: 'X1' [1]
#> Warning: Duplicated column names deduplicated: 'X1' => 'X1_1' [3]
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   Y = col_double(),
#>   X1_1 = col_double(),
#>   X2 = col_double(),
#>   X3 = col_double(),
#>   X4 = col_double(),
#>   X5 = col_double(),
#>   X6 = col_double(),
#>   X7 = col_double(),
#>   X8 = col_double(),
#>   X9 = col_double(),
#>   X10 = col_double(),
#>   X11 = col_double(),
#>   X12 = col_double(),
#>   X13 = col_double(),
#>   X14 = col_double(),
#>   X15 = col_double(),
#>   X16 = col_double()
#> )
# remove row names and rename first independent variable
binary_data <- binary_data %>% select(-X1) %>% rename(X1 = X1_1)
head(binary_data)
#> # A tibble: 6 x 17
#>       Y    X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     0     0     0     0     0     0     1     0     0     0     0
#> 2     1     0     0     0     0     0     0     0     1     0     0     0     0
#> 3     1     0     0     0     0     0     0     0     0     0     0     0     0
#> 4     1     0     0     0     0     0     0     0     0     0     0     0     0
#> 5     1     0     0     0     0     0     1     0     0     0     0     0     0
#> 6     1     0     0     1     0     0     0     0     0     0     0     0     0
#> # … with 4 more variables: X13 <dbl>, X14 <dbl>, X15 <dbl>, X16 <dbl>
# create fully saturated model
fit <- glm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + X13 + X14 + X15 + X16, data = binary_data, family = binomial(link = "logit"))
summary(fit)
#> 
#> Call:
#> glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + 
#>     X10 + X11 + X12 + X13 + X14 + X15 + X16, family = binomial(link = "logit"), 
#>     data = binary_data)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.8522   0.2025   0.2916   0.3122   1.9870  
#> 
#> Coefficients: (3 not defined because of singularities)
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   1.5533     0.2347   6.619 3.61e-11 ***
#> X1                NA         NA      NA       NA    
#> X2           -2.8596     0.2961  -9.659  < 2e-16 ***
#> X3            1.4435     0.2471   5.843 5.13e-09 ***
#> X4           -2.1107     0.2599  -8.121 4.63e-16 ***
#> X5           12.0127   239.4433   0.050   0.9600    
#> X6            0.7076     0.3098   2.284   0.0224 *  
#> X7            0.3745     0.4453   0.841   0.4002    
#> X8            1.5831     0.2570   6.160 7.29e-10 ***
#> X9            2.3236     0.3343   6.950 3.65e-12 ***
#> X10               NA         NA      NA       NA    
#> X11          -2.3006     0.2963  -7.763 8.28e-15 ***
#> X12          -3.3779     0.5360  -6.302 2.94e-10 ***
#> X13           0.3780     0.2677   1.412   0.1580    
#> X14          -2.4696     0.3938  -6.271 3.58e-10 ***
#> X15           2.4970     0.3740   6.677 2.44e-11 ***
#> X16               NA         NA      NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 6153.2  on 9348  degrees of freedom
#> Residual deviance: 4167.3  on 9335  degrees of freedom
#> AIC: 4195.3
#> 
#> Number of Fisher Scoring iterations: 12

The first thing to note is the change between Null deviance and Residual deviance at the bottom of the summary. This plays a similar role to the F Statistic in lm. The decrease shows that the model explains more of the variation.

The p-values assess the usefulness or not of the coefficients. To begin, some of the coefficients are NA. It turns out that all of the values are 0; and it should be obvious that they should be eliminated from the model because they tell us literally nothing.

Some of the remaining coefficients fail the significance test at the traditional \alpha = 0.05 level. We'll eliminate those, along with the NA coefficients and rerun the model:

# remove NA and p-values < $\alpha$
fit2 <- glm(Y ~ X2 + X3 + X4 + X6 + X8 + X9 + X11 + X12 + X14 + X15, data = binary_data, family = binomial(link = "logit"))
summary(fit2)
#> 
#> Call:
#> glm(formula = Y ~ X2 + X3 + X4 + X6 + X8 + X9 + X11 + X12 + X14 + 
#>     X15, family = binomial(link = "logit"), data = binary_data)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.8522   0.2025   0.2916   0.3122   1.9870  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   1.8663     0.1080  17.281  < 2e-16 ***
#> X2           -3.1726     0.2104 -15.082  < 2e-16 ***
#> X3            1.1305     0.1328   8.515  < 2e-16 ***
#> X4           -2.4237     0.1554 -15.598  < 2e-16 ***
#> X6            0.3946     0.2293   1.721   0.0852 .  
#> X8            1.2701     0.1505   8.440  < 2e-16 ***
#> X9            2.0106     0.2615   7.690 1.47e-14 ***
#> X11          -2.6136     0.2107 -12.402  < 2e-16 ***
#> X12          -3.6909     0.4939  -7.473 7.83e-14 ***
#> X14          -2.7826     0.3342  -8.327  < 2e-16 ***
#> X15           2.1840     0.3105   7.033 2.01e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 6153.2  on 9348  degrees of freedom
#> Residual deviance: 4170.7  on 9338  degrees of freedom
#> AIC: 4192.7
#> 
#> Number of Fisher Scoring iterations: 6

^{Created on 2019-12-17 by the reprex package (v0.3.0)}

We would eliminate X6 in like manner, leading, perhaps to further rounds of elimination.

But let's skip over those to find the missing measure of probability. Strangely, it's not in the glm object, but given by

logLik(fit)
'log Lik.' -2084 (df=14)

What's this? The log likelihood. How to interpret it?

Usually, this is done through conversion to an odds ratio.

odr <- function(x) {
    exp(cbind(OR = coef(x), confint(x)))
}
odr(fit2)
#> Waiting for profiling to be done...
#>                     OR       2.5 %      97.5 %
#> (Intercept) 6.46464646 5.258904976  8.03440121
#> X2          0.04189453 0.027435667  0.06266866
#> X3          3.09726563 2.381064505  4.00914584
#> X4          0.08859375 0.065075959  0.11970440
#> X6          1.48385417 0.959876953  2.36568686
#> X8          3.56106908 2.651198481  4.78618345
#> X9          7.46796870 4.587235443 12.86209781
#> X11         0.07327303 0.048110916  0.11004687
#> X12         0.02494960 0.008361653  0.06034486
#> X14         0.06187500 0.031190057  0.11665747
#> X15         8.88164006 5.031918109 17.18366500

^{Created on 2019-12-17 by the reprex package (v0.3.0)}

An odds ratio equal to 1 translates Y s equally likely with or without X; OR > 1, say 2, means more, twice, as likely. OR = 0.5 = less, half as likely.

For goodness of fit, there's the Hosmer-Lemeshow goodness of fit test , which has a null hypothesis H_0, that the fit is poor; accordingly a high p-value is evidence of a good fit. The results are based on dividing the probabilities for the response variable, Y into deciles and then to examine the expected and actual results against their estimates, as shown in the following two tables. Note that the interpretation of p-value is backwards from what you may be used to.

There's much more, but I hope this will get you oriented.