Intepreting Results from the Firth Logistic Regression Model

Hi All,

Hope you are doing well!...I am trying to interpret the results of the Firth Logistic regression that I ran using R ..Please find the details below:..Can you please let me know as to how I can calculate the probability calculation for the output .. I have created a model to predict the probability of an order being turned into a claim...

library(readxl)

modeldata4 <- read_excel("12312019/modeldata4.xlsx")
View(modeldata4)
library(logistf)
fit2 = logistf(atozclaim ~ naomiclassi + noofcustomercontacts + numberofojcontacts + shippedlate + deliveredlate + averagetimebetweencommunication + ordervaluebucket, data=modeldata4)
summary(fit2)
logistf(formula = atozclaim ~ naomiclassi + noofcustomercontacts +
numberofojcontacts + shippedlate + deliveredlate + averagetimebetweencommunication +
ordervaluebucket, data = modeldata4)

Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood Profile Likelihood

                                     coef    se(coef)   lower 0.95   upper 0.95     Chisq

(Intercept) -5.5596558891 0.293728326 -6.142364054 -4.989316185 Inf
naomiclassi -0.0631661281 0.095692841 -0.247926251 0.127752478 0.4302098
noofcustomercontacts 0.1285050139 0.009052602 0.110887744 0.146370626 Inf
numberofojcontacts 0.0076878342 0.004452480 -0.001186831 0.016272319 2.8969748
shippedlate 0.0558460871 0.124119528 -0.181931303 0.305609760 0.2044196
deliveredlate 1.4038096962 0.121450488 1.170953972 1.647958476 Inf
averagetimebetweencommunication 0.0007786019 0.001380343 -0.002732755 0.002999507 0.2821846
ordervaluebucket 0.0671000803 0.025979585 0.016615921 0.118629803 6.8329751
p
(Intercept) 0.000000000
naomiclassi 0.511886036
noofcustomercontacts 0.000000000
numberofojcontacts 0.088745965
shippedlate 0.651176866
deliveredlate 0.000000000
averagetimebetweencommunication 0.595272906
ordervaluebucket 0.008949008

Likelihood ratio test=928.2424 on 7 df, p=0, n=27078
Wald test = 865.5874 on 7 df, p = 0

Covariance-Matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 8.627633e-02 -7.432426e-03 2.891196e-04 7.544731e-05 -1.425216e-02 -1.610249e-02
[2,] -7.432426e-03 9.157120e-03 1.769075e-05 -2.376253e-05 -1.934229e-04 7.325374e-05
[3,] 2.891196e-04 1.769075e-05 8.194960e-05 -2.857597e-05 -6.170443e-05 -9.557881e-05
[4,] 7.544731e-05 -2.376253e-05 -2.857597e-05 1.982458e-05 -6.458494e-05 -6.422542e-05
[5,] -1.425216e-02 -1.934229e-04 -6.170443e-05 -6.458494e-05 1.540566e-02 2.779343e-03
[6,] -1.610249e-02 7.325374e-05 -9.557881e-05 -6.422542e-05 2.779343e-03 1.475022e-02
[7,] 7.756402e-06 1.881604e-08 2.592327e-07 -8.577499e-07 1.726217e-06 -2.593844e-06
[8,] -6.027436e-03 3.322564e-05 -2.659075e-05 -9.320206e-07 -9.323932e-05 1.843344e-04
[,7] [,8]
[1,] 7.756402e-06 -6.027436e-03
[2,] 1.881604e-08 3.322564e-05
[3,] 2.592327e-07 -2.659075e-05
[4,] -8.577499e-07 -9.320206e-07
[5,] 1.726217e-06 -9.323932e-05
[6,] -2.593844e-06 1.843344e-04
[7,] 1.905346e-06 -1.349211e-06
[8,] -1.349211e-06 6.749388e-04

Thanks,
Arun

PLEASE, a reproducible example, called a reprex with data that are either those you are working with or representative standard dataset, such as mtcar.

tibble::tribble(
~orderid, ~naomiclassi, ~noofcustomercontacts, ~numberofojcontacts, ~shippedlate, ~deliveredlate, ~averagetimebetweencommunication, ~ordervaluebucket, ~atozclaim,
4124763, 1, 2, 14, 1, 1, 3.366666, 8, 1,
4122313, 1, 2, 2, 1, 1, 13.933333, 8, 1,
4120369, 1, 2, 4, 1, 1, 0, 10, 1,
4106030, 1, 2, 3, 0, 1, 5.683333, 9, 1,
4104061, 1, 0, 3, 1, 0, 0, 11, 1,
4102438, 1, 9, 11, 1, 0, 3.758333, 12, 1,
4100101, 1, 48, 34, 1, 1, 4.576922, 10, 1,
4099436, 1, 22, 22, 1, 1, 3.252777, 9, 1,
4096928, 1, 8, 16, 1, 1, 2.949999, 10, 1,
4095858, 1, 0, 1, 1, 0, 0, 10, 1,
4094935, 1, 6, 12, 1, 0, 32.920833, 11, 1,
4093557, 1, 2, 10, 1, 1, 4.15, 10, 1,
4091346, 1, 2, 2, 1, 1, 0, 11, 1,
4091022, 1, 11, 14, 1, 0, 5.881481, 10, 1,
4090255, 0, 1, 6, 1, 0, 0.383333, 10, 1,
4090235, 1, 1, 3, 1, 0, 6.533333, 9, 1,
4090107, 1, 4, 10, 1, 1, 8.7, 9, 1,
4090098, 1, 2, 3, 1, 0, 0.016666, 9, 1,
4089524, 1, 8, 16, 1, 1, 0.358333, 8, 1,
4088698, 1, 4, 18, 1, 1, 9.674999, 8, 1,
4179262, 1, 2, 4, 1, 0, 1.05, 9, 0,
4179240, 1, 0, 4, 1, 0, 0, 8, 0,
4179195, 1, 1, 3, 1, 0, 0.25, 9, 0,
4179101, 1, 0, 1, 1, 0, 0, 10, 0,
4179088, 1, 0, 0, 1, 0, 0, 4, 0,
4179059, 1, 0, 1, 1, 0, 0, 5, 0,
4178980, 1, 2, 0, 1, 0, 0, 9, 0,
4178927, 0, 1, 6, 1, 0, 4.883333, 9, 0,
4178887, 0, 0, 0, 1, 0, 0, 9, 0,
4178884, 0, 1, 4, 1, 0, 10.8, 9, 0,
4178855, 1, 0, 0, 1, 0, 0, 9, 0,
4178845, 0, 0, 0, 1, 0, 0, 10, 0,
4178767, 0, 0, 0, 1, 0, 0, 9, 0,
4178695, 1, 0, 0, 1, 0, 0, 7, 0,
4178691, 0, 0, 0, 1, 0, 0, 9, 0,
4178552, 1, 0, 0, 1, 0, 0, 10, 0,
4178441, 1, 0, 0, 1, 0, 0, 10, 0,
4178363, 1, 0, 0, 1, 0, 0, 7, 0,
4178341, 1, 1, 2, 1, 0, 0.566666, 9, 0,
4178230, 1, 0, 3, 1, 0, 0, 7, 0,
4178221, 1, 0, 3, 1, 0, 0, 7, 0,
4178213, 1, 0, 0, 1, 0, 0, 8, 0,
4178123, 1, 0, 0, 1, 0, 0, 9, 0,
4178020, 1, 0, 0, 1, 0, 0, 9, 0,
4177992, 1, 0, 0, 1, 0, 0, 8, 0,
4177950, 0, 0, 0, 1, 0, 0, 9, 0,
4177876, 1, 0, 0, 1, 0, 0, 9, 0,
4177873, 1, 0, 0, 1, 0, 0, 9, 0,
4177822, 1, 0, 0, 1, 0, 0, 8, 0,
4177765, 1, 1, 0, 1, 0, 0, 8, 0,
4177634, 1, 0, 0, 1, 0, 0, 8, 0,
4177586, 1, 0, 0, 1, 0, 0, 10, 0,
4177585, 1, 0, 3, 1, 0, 0, 8, 0,
4177566, 1, 3, 4, 1, 0, 4.866666, 8, 0,
4177482, 1, 0, 0, 1, 0, 0, 9, 0,
4177480, 1, 0, 1, 1, 0, 0, 11, 0,
4177469, 1, 0, 0, 1, 0, 0, 9, 0,
4177461, 1, 0, 0, 1, 0, 0, 9, 0,
4177358, 1, 0, 0, 1, 0, 0, 8, 0,
4177312, 1, 0, 1, 1, 0, 0, 11, 0,
4177256, 1, 0, 0, 1, 0, 0, 9, 0,
4177212, 1, 0, 0, 1, 0, 0, 8, 0,
4177195, 1, 0, 0, 1, 0, 0, 10, 0,
4177159, 1, 0, 0, 1, 0, 0, 7, 0,
4177122, 0, 0, 0, 1, 0, 0, 9, 0,
4177055, 0, 0, 0, 1, 0, 0, 8, 0,
4176867, 1, 0, 0, 1, 0, 0, 8, 0,
4176845, 1, 1, 2, 1, 0, 1.45, 7, 0,
4176819, 1, 3, 1, 1, 0, 5.083333, 10, 0,
4176814, 1, 0, 0, 1, 0, 0, 9, 0
)

@ [technocrat] :Appreciate your response!...I have provided the sample data...

Thanks,
Arun

1 Like

Also I have an events ratio of 3% of occurence of claims in my data...

Thanks @arunchandra, that helps a lot. I've taken your data and the code above to put it all in one reprex. I've renamed the initial model saturated because it contains all variates, and I've used glm rather than logistf because it's simpler and the principles are the same.

The first thing to notice is the p-values in the summary, which indicates the degree to which a variate contributes to the model. Using the conventional \alpha =0.05 not all are significant.

The reduced1 model eliminates those. In reduced2 only a single variate is used. One is added back in reduced3.

We can see the differences in the summaries, the log likelihoods, odds ratios and goodness of fits.

The log likelihood parameter returned by logLik is mysterious in its interpretation (it makes calculations more tractable). The odr function produces odds ratios, a parameter for E(Y | X), the expectation of Y given X. Consider odds ratios of -0.5, 1, and 2: Y given X is half as likely, equally likely and twice as likely.

To your original question, it is, indeed possible to convert odds ratios to probabilities, using this snippet

par(las=1,bty="l")
curve(qlogis,from=0.01,to=0.99,
      xlab="probability",ylab="log-odds")
abline(h=0,lty=2)

Can you see the difficulty that this could pose?

suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(logistf)) 
suppressPackageStartupMessages(library(ResourceSelection))

# function to compute odds ratio
odr <- function(x) {
  exp(cbind(OR = coef(x), confint(x)))
}

modeldata4 <- tibble::tribble(
~orderid, ~naomiclassi, ~noofcustomercontacts, ~numberofojcontacts, ~shippedlate, ~deliveredlate, ~averagetimebetweencommunication, ~ordervaluebucket, ~atozclaim,
4124763, 1, 2, 14, 1, 1, 3.366666, 8, 1,
4122313, 1, 2, 2, 1, 1, 13.933333, 8, 1,
4120369, 1, 2, 4, 1, 1, 0, 10, 1,
4106030, 1, 2, 3, 0, 1, 5.683333, 9, 1,
4104061, 1, 0, 3, 1, 0, 0, 11, 1,
4102438, 1, 9, 11, 1, 0, 3.758333, 12, 1,
4100101, 1, 48, 34, 1, 1, 4.576922, 10, 1,
4099436, 1, 22, 22, 1, 1, 3.252777, 9, 1,
4096928, 1, 8, 16, 1, 1, 2.949999, 10, 1,
4095858, 1, 0, 1, 1, 0, 0, 10, 1,
4094935, 1, 6, 12, 1, 0, 32.920833, 11, 1,
4093557, 1, 2, 10, 1, 1, 4.15, 10, 1,
4091346, 1, 2, 2, 1, 1, 0, 11, 1,
4091022, 1, 11, 14, 1, 0, 5.881481, 10, 1,
4090255, 0, 1, 6, 1, 0, 0.383333, 10, 1,
4090235, 1, 1, 3, 1, 0, 6.533333, 9, 1,
4090107, 1, 4, 10, 1, 1, 8.7, 9, 1,
4090098, 1, 2, 3, 1, 0, 0.016666, 9, 1,
4089524, 1, 8, 16, 1, 1, 0.358333, 8, 1,
4088698, 1, 4, 18, 1, 1, 9.674999, 8, 1,
4179262, 1, 2, 4, 1, 0, 1.05, 9, 0,
4179240, 1, 0, 4, 1, 0, 0, 8, 0,
4179195, 1, 1, 3, 1, 0, 0.25, 9, 0,
4179101, 1, 0, 1, 1, 0, 0, 10, 0,
4179088, 1, 0, 0, 1, 0, 0, 4, 0,
4179059, 1, 0, 1, 1, 0, 0, 5, 0,
4178980, 1, 2, 0, 1, 0, 0, 9, 0,
4178927, 0, 1, 6, 1, 0, 4.883333, 9, 0,
4178887, 0, 0, 0, 1, 0, 0, 9, 0,
4178884, 0, 1, 4, 1, 0, 10.8, 9, 0,
4178855, 1, 0, 0, 1, 0, 0, 9, 0,
4178845, 0, 0, 0, 1, 0, 0, 10, 0,
4178767, 0, 0, 0, 1, 0, 0, 9, 0,
4178695, 1, 0, 0, 1, 0, 0, 7, 0,
4178691, 0, 0, 0, 1, 0, 0, 9, 0,
4178552, 1, 0, 0, 1, 0, 0, 10, 0,
4178441, 1, 0, 0, 1, 0, 0, 10, 0,
4178363, 1, 0, 0, 1, 0, 0, 7, 0,
4178341, 1, 1, 2, 1, 0, 0.566666, 9, 0,
4178230, 1, 0, 3, 1, 0, 0, 7, 0,
4178221, 1, 0, 3, 1, 0, 0, 7, 0,
4178213, 1, 0, 0, 1, 0, 0, 8, 0,
4178123, 1, 0, 0, 1, 0, 0, 9, 0,
4178020, 1, 0, 0, 1, 0, 0, 9, 0,
4177992, 1, 0, 0, 1, 0, 0, 8, 0,
4177950, 0, 0, 0, 1, 0, 0, 9, 0,
4177876, 1, 0, 0, 1, 0, 0, 9, 0,
4177873, 1, 0, 0, 1, 0, 0, 9, 0,
4177822, 1, 0, 0, 1, 0, 0, 8, 0,
4177765, 1, 1, 0, 1, 0, 0, 8, 0,
4177634, 1, 0, 0, 1, 0, 0, 8, 0,
4177586, 1, 0, 0, 1, 0, 0, 10, 0,
4177585, 1, 0, 3, 1, 0, 0, 8, 0,
4177566, 1, 3, 4, 1, 0, 4.866666, 8, 0,
4177482, 1, 0, 0, 1, 0, 0, 9, 0,
4177480, 1, 0, 1, 1, 0, 0, 11, 0,
4177469, 1, 0, 0, 1, 0, 0, 9, 0,
4177461, 1, 0, 0, 1, 0, 0, 9, 0,
4177358, 1, 0, 0, 1, 0, 0, 8, 0,
4177312, 1, 0, 1, 1, 0, 0, 11, 0,
4177256, 1, 0, 0, 1, 0, 0, 9, 0,
4177212, 1, 0, 0, 1, 0, 0, 8, 0,
4177195, 1, 0, 0, 1, 0, 0, 10, 0,
4177159, 1, 0, 0, 1, 0, 0, 7, 0,
4177122, 0, 0, 0, 1, 0, 0, 9, 0,
4177055, 0, 0, 0, 1, 0, 0, 8, 0,
4176867, 1, 0, 0, 1, 0, 0, 8, 0,
4176845, 1, 1, 2, 1, 0, 1.45, 7, 0,
4176819, 1, 3, 1, 1, 0, 5.083333, 10, 0,
4176814, 1, 0, 0, 1, 0, 0, 9, 0
)
# saturated model
saturated <- glm(atozclaim ~ naomiclassi + noofcustomercontacts + numberofojcontacts + shippedlate + deliveredlate + averagetimebetweencommunication + ordervaluebucket, data=modeldata4)
summary(saturated)
#> 
#> Call:
#> glm(formula = atozclaim ~ naomiclassi + noofcustomercontacts + 
#>     numberofojcontacts + shippedlate + deliveredlate + averagetimebetweencommunication + 
#>     ordervaluebucket, data = modeldata4)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -0.30837  -0.13333  -0.05838   0.00075   0.81595  
#> 
#> Coefficients:
#>                                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                     -0.453560   0.372006  -1.219 0.227376    
#> naomiclassi                      0.089934   0.092032   0.977 0.332267    
#> noofcustomercontacts            -0.020982   0.010089  -2.080 0.041705 *  
#> numberofojcontacts               0.041136   0.013238   3.107 0.002847 ** 
#> shippedlate                     -0.275705   0.289239  -0.953 0.344186    
#> deliveredlate                    0.470521   0.125943   3.736 0.000409 ***
#> averagetimebetweencommunication  0.012304   0.007665   1.605 0.113503    
#> ordervaluebucket                 0.082415   0.024643   3.344 0.001403 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 0.0691428)
#> 
#>     Null deviance: 14.2857  on 69  degrees of freedom
#> Residual deviance:  4.2869  on 62  degrees of freedom
#> AIC: 21.145
#> 
#> Number of Fisher Scoring iterations: 2
logLik(saturated)
#> 'log Lik.' -1.572722 (df=9)
odr(saturated)
#> Waiting for profiling to be done...
#>                                        OR     2.5 %    97.5 %
#> (Intercept)                     0.6353622 0.3064570 1.3172653
#> naomiclassi                     1.0941020 0.9135241 1.3103751
#> noofcustomercontacts            0.9792367 0.9600625 0.9987938
#> numberofojcontacts              1.0419935 1.0153056 1.0693829
#> shippedlate                     0.7590367 0.4305879 1.3380235
#> deliveredlate                   1.6008279 1.2506670 2.0490266
#> averagetimebetweencommunication 1.0123804 0.9972855 1.0277037
#> ordervaluebucket                1.0859060 1.0347043 1.1396414
saturated_gof <- hoslem.test(saturated$y, fitted(saturated), g=10)
saturated_gof
#> 
#>  Hosmer and Lemeshow goodness of fit (GOF) test
#> 
#> data:  saturated$y, fitted(saturated)
#> X-squared = 2.37, df = 8, p-value = 0.9675
saturated_exp_ob <- cbind(saturated_gof$observed,saturated_gof$expected)
saturated_exp_ob
#>                y0 y1      yhat0       yhat1
#> [-0.31,0.0111]  7  0  7.7540233 -0.75402326
#> (0.0111,0.02]  13  0 12.7810589  0.21894112
#> (0.02,0.0608]   1  0  0.9395629  0.06043709
#> (0.0608,0.102] 13  0 11.7591515  1.24084846
#> (0.102,0.157]   1  0  0.8566066  0.14339335
#> (0.157,0.185]   7  1  6.5400318  1.45996815
#> (0.185,0.286]   4  2  4.5915284  1.40847157
#> (0.286,0.614]   4  3  4.4776175  2.52238252
#> (0.614,1.01]    0  7  1.0895950  5.91040496
#> (1.01,1.27]     0  7 -0.7891760  7.78917603
# reduced model with all coefficients having p-values < 0.05
reduced1 <- glm(atozclaim ~ noofcustomercontacts + numberofojcontacts + deliveredlate + ordervaluebucket, data=modeldata4)
summary(reduced1)
#> 
#> Call:
#> glm(formula = atozclaim ~ noofcustomercontacts + numberofojcontacts + 
#>     deliveredlate + ordervaluebucket, data = modeldata4)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -0.34941  -0.14438  -0.09881   0.01671   0.81119  
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)          -0.67707    0.21841  -3.100 0.002860 ** 
#> noofcustomercontacts -0.02354    0.00978  -2.406 0.018960 *  
#> numberofojcontacts    0.04569    0.01196   3.818 0.000302 ***
#> deliveredlate         0.51981    0.11654   4.460 3.32e-05 ***
#> ordervaluebucket      0.08621    0.02463   3.500 0.000846 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 0.0712616)
#> 
#>     Null deviance: 14.286  on 69  degrees of freedom
#> Residual deviance:  4.632  on 65  degrees of freedom
#> AIC: 20.566
#> 
#> Number of Fisher Scoring iterations: 2
odr(reduced1)
#> Waiting for profiling to be done...
#>                             OR     2.5 %    97.5 %
#> (Intercept)          0.5081026 0.3311614 0.7795845
#> noofcustomercontacts 0.9767386 0.9581936 0.9956426
#> numberofojcontacts   1.0467484 1.0224862 1.0715863
#> deliveredlate        1.6817048 1.3382950 2.1132343
#> ordervaluebucket     1.0900344 1.0386622 1.1439475
logLik(reduced1)
#> 'log Lik.' -4.282998 (df=6)
reduced1_gof <- hoslem.test(reduced1$y, fitted(reduced1), g=10)
reduced1_gof
#> 
#>  Hosmer and Lemeshow goodness of fit (GOF) test
#> 
#> data:  reduced1$y, fitted(reduced1)
#> X-squared = 6.4281, df = 8, p-value = 0.5994
reduced1_exp_ob <- cbind(reduced1_gof$observed,reduced1_gof$expected)
reduced1_exp_ob
#>                 y0 y1     yhat0      yhat1
#> [-0.332,0.0108]  7  0  7.770093 -0.7700926
#> (0.0108,0.0126]  8  0  7.899182  0.1008180
#> (0.0126,0.0988] 18  0 16.339170  1.6608295
#> (0.0988,0.155]   2  0  1.725584  0.2744163
#> (0.155,0.187]    7  0  5.748143  1.2518573
#> (0.187,0.242]    4  3  5.495243  1.5047566
#> (0.242,0.568]    4  3  4.349050  2.6509499
#> (0.568,0.991]    0  7  1.730992  5.2690081
#> (0.991,1.26]     0  7 -1.057457  8.0574569

# minimal model with only one variate
reduced2 <- glm(atozclaim ~ deliveredlate, data=modeldata4)
summary(reduced2)
#> 
#> Call:
#> glm(formula = atozclaim ~ deliveredlate, data = modeldata4)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.1379  -0.1379  -0.1379   0.0000   0.8621  
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    0.13793    0.04182   3.298  0.00155 ** 
#> deliveredlate  0.86207    0.10100   8.536 2.34e-12 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 0.1014199)
#> 
#>     Null deviance: 14.2857  on 69  degrees of freedom
#> Residual deviance:  6.8966  on 68  degrees of freedom
#> AIC: 42.428
#> 
#> Number of Fisher Scoring iterations: 2
logLik(reduced2)
#> 'log Lik.' -18.21412 (df=3)
odr(reduced2)
#> Waiting for profiling to be done...
#>                     OR    2.5 %   97.5 %
#> (Intercept)   1.147896 1.057568 1.245939
#> deliveredlate 2.368055 1.942779 2.886424
reduced2_gof <- hoslem.test(reduced2$y, fitted(reduced2), g=10)
reduced2_gof
#> 
#>  Hosmer and Lemeshow goodness of fit (GOF) test
#> 
#> data:  reduced2$y, fitted(reduced2)
#> X-squared = 2.4865e-29, df = 8, p-value = 1
reduced2_exp_ob <- cbind(reduced2_gof$observed,reduced2_gof$expected)
reduced2_exp_ob
#>           y0 y1 yhat0 yhat1
#> [0.138,1] 50 20    50    20

# reduced model with two variables with lowest p-values
reduced3 <- glm(atozclaim ~ deliveredlate + ordervaluebucket, data=modeldata4)
summary(reduced3)
#> 
#> Call:
#> glm(formula = atozclaim ~ deliveredlate + ordervaluebucket, data = modeldata4)
#> 
#> Deviance Residuals: 
#>      Min        1Q    Median        3Q       Max  
#> -0.35018  -0.15783  -0.07090   0.03453   0.84217  
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)      -0.70777    0.23767  -2.978 0.004037 ** 
#> deliveredlate     0.82614    0.09364   8.822 7.97e-13 ***
#> ordervaluebucket  0.09618    0.02667   3.606 0.000593 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for gaussian family taken to be 0.08620277)
#> 
#>     Null deviance: 14.2857  on 69  degrees of freedom
#> Residual deviance:  5.7756  on 67  degrees of freedom
#> AIC: 32.012
#> 
#> Number of Fisher Scoring iterations: 2
logLik(reduced3)
#> 'log Lik.' -12.00575 (df=4)
odr(reduced3)
#> Waiting for profiling to be done...
#>                         OR     2.5 %    97.5 %
#> (Intercept)      0.4927427 0.3092552 0.7850972
#> deliveredlate    2.2844849 1.9014236 2.7447178
#> ordervaluebucket 1.1009545 1.0448815 1.1600367
reduced3_gof <- hoslem.test(reduced3$y, fitted(reduced3), g=10)
reduced3_gof
#> 
#>  Hosmer and Lemeshow goodness of fit (GOF) test
#> 
#> data:  reduced3$y, fitted(reduced3)
#> X-squared = 2.2295, df = 8, p-value = 0.9732
reduced3_exp_ob <- cbind(reduced3_gof$observed,reduced3_gof$expected)
reduced3_exp_ob
#>                  y0 y1      yhat0      yhat1
#> [-0.323,-0.0345]  8  0  8.7570900 -0.7570900
#> (-0.0345,0.0617] 12  0 11.2601726  0.7398274
#> (0.0617,0.158]   21  2 19.3699137  3.6300863
#> (0.158,0.254]     7  3  7.4599260  2.5400740
#> (0.254,0.35]      2  2  2.5992602  1.4007398
#> (0.35,0.984]      0  8  1.0505549  6.9494451
#> (0.984,1.18]      0  5 -0.4969174  5.4969174

Created on 2019-12-31 by the reprex package (v0.3.0)

There is only one logistic regression model. Maximum likelihood estimates and Firth estimates are two different ways to estimate the parameters in that model. MLE and Firth estimates have similar properties and for most purposes you can interpret Firth estimates just like you would interpret MLE estimates.

Personally, I don't find MLE estimates particularly interpretable themselves, but use the divide-by-4 rule when pressed.

@ [technocrat]: Really Appreciate your response!..Can you please let me know how to get the ROC curve, confusion matrix and the predictied probabilities...

Thanks,
ARUN

This post provides an example.

Thanks @ technocrat !..

Thanks,
Arun

@technocrat,@ [alexpghayes] :Can you please provide me with sample codes on how to do Over Sampling and Under Sampling in R for the same example above...

I tried the following...

ctrl <- trainControl(method = "repeatedcv",

  •                  number = 10, 
    
  •                  repeats = 10, 
    
  •                  verboseIter = FALSE,
    
  •                  sampling = "down")
    

Error in trainControl(method = "repeatedcv", number = 10, repeats = 10, :
could not find function "trainControl"

Thanks,
Arun

I have installed the caret package...Still I am getting this error...Can you please help here...

See the mlr or imbalance package for oversample() functions. mlr also has undersample()

@technocrat: Appreciate your response!...Can you please let me know which version of R supports Imbalance ratio package..

Thanks,
Arun

Depends R (>= 3.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.