Performing GLMM using binomial data


the code you show uses the reprex package (good job! :+1:), but it's not really a reprex in the sense of a reproducible example. The goal of a reprex is to allow us to reproduce, on our machines, the same error (or an error as close as possible), to the error you're getting:

Error in pwrssUpdate(pp, resp, tol = tolPwrss, GQmat = GQmat, compDev = compDev,  : 
  (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
In addition: Warning message:
In optwrap(optimizer, devfun, start, rho$lower, control = control,  :
  convergence code 9999 from optimx

Thus, your expression, even if generated with the reprex package, is not a reprex, because it generates a different error:

#> Error in glmer(smear ~ sex + age + hiv_status + treatmenthistory + month + : could not find function "glmer"

I'll show you quickly how to modify your reprex stub into an actual reprex, and then give you a link to learn more about building reprexes yourself: first of all, in a reprex you need to load all and only the packages which are strictly necessary to reproduce your problem. In your case, the only package needed is lme4, thus start by modifying your code to:

m36 <- glmer(smear ~ sex + age + hiv_status + treatmenthistory + month + (1|serialnumber),
             data = phd_dat, family = binomial("logit"), 
             control = glmerControl(optimizer = "bobyqa",, optCtrl = list(maxfun = 100000)),
             nAGQ = 25)

Now, try creating a reprex with it:

#> Loading required package: Matrix
m36 <- glmer(smear ~ sex + age + hiv_status + treatmenthistory + month + (1|serialnumber),
             data = phd_dat, family = binomial("logit"), 
             control = glmerControl(optimizer = "bobyqa",, optCtrl = list(maxfun = 100000)),
             nAGQ = 25)
#> Error: 'data' not found, and some variables missing from formula environment

Created on 2018-10-11 by the reprex package (v0.2.1)

Session info
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.5 LTS
#> Matrix products: default
#> BLAS: /usr/lib/atlas-base/atlas/
#> LAPACK: /usr/lib/atlas-base/atlas/
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#> [1] lme4_1.1-18-1 Matrix_1.2-14
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_0.12.19    lattice_0.20-35 digest_0.6.18   rprojroot_1.3-2
#>  [5] MASS_7.3-49     grid_3.5.0      nlme_3.1-137    backports_1.1.2
#>  [9] magrittr_1.5    evaluate_0.12   stringi_1.2.4   minqa_1.2.4    
#> [13] nloptr_1.2.1    rmarkdown_1.10  splines_3.5.0   tools_3.5.0    
#> [17] stringr_1.3.1   yaml_2.2.0      compiler_3.5.0  htmltools_0.3.6
#> [21] knitr_1.20

See? You're getting a new error, and again an error which is not the same you're getting. Why? Because now your code includes all and only the necessary packages, but it doesn't include the data necessary to make it run (which I guess reside in the phd_dat dataframe). Now, probably you don't want to share publicly data containing information on smear, sex, age, hiv_status, treatmenthistory, etc. for the subjects you're following. Or maybe you want, but it's a big data set and thus it would be inconvenient for you to share it with us (and for us to test it). In this case, you have three options:

  • reproduce your error with one of the default datasets included in base R or in the package lme4. Thisi s not always possible.
  • try creating a small, fake data set, which reproduces the error you're getting. This is a great option.
  • extract a random subset from your actual data set, and include it into your code using the function dput. This works if you don't have sensitive data.

For more info about reprexes, please see here .

PS your issue seems to be optimization-related. Once we manage to have a reprex up and running, it could be a good idea for you to ask on Stack Overflow or Cross Validated...I don't want to jinx you, but frequentist estimation of GLMMs can be...not straightforward. Maybe you'd have better luck with a Bayesian approach:

1 Like