Stranger things

e_heilbron · February 23, 2018, 10:00pm

the next is a copy of the scip i am working on,as you can see, i try to compare models, i fit them with the same s variables, and i get this error mensagge, why?

mod3<-glm.nb(IBDQ32~sexo+edad+audit)
mod4<-glm.nb(IBDQ32~sexo+edad+audit+caa)
mod5<-glm.nb(IBDQ32~sexo+edad+audit+caa+imc)
mod6<-glm.nb(IBDQ32~sexo+edad+imc)

library(MuMIn)

# Warning message:
# package ‘MuMIn’ was built under R version 3.4.3 

ModelSel.NB<-model.sel(mod3,mod4,mod5,mod6, rank="AICc")

# Warning message:
# In model.sel.default(mod3, mod4, mod5, mod6, rank = "AICc") :
#   models are not all fitted to the same data

mishabalyasin · February 24, 2018, 9:16am

First, I think you are getting a warning, not an error. Nevertheless, you are probably not getting the result you want.

But more importantly, in your code snippet you don't specify data argument when you fit the model. Are you sure you are not failing at this stage? Where are you taking glm.nb from? In all model fitting function I've came across data is a compulsory argument, so I would imagine your error happens there, not at model selection stage.

pete · February 24, 2018, 6:50pm

Or/additionally, you may have missing values in your data. For instance, if caa has 3 missing values, the number observations (the data) will be different between mod3 and mod4.

e_heilbron · February 26, 2018, 5:47pm

but there are variables of the same sheet of data

e_heilbron · February 26, 2018, 5:49pm

the arguments of the models were properly defined, sorry for not post it

mishabalyasin · February 26, 2018, 6:11pm

Next step is for you to create a reprex, so it is easier to see if there are any issues.

FAQ: What's a reproducible example (`reprex`) and how do I create one? meta

Why reprex? Getting unstuck is hard. Your first step here is usually to create a reprex, or reproducible example. The goal of a reprex is to package your code, and information about your problem so that others can run it and feel your pain. Then, hopefully, folks can more easily provide a solution. What's in a Reproducible Example? Parts of a reproducible example: background information - Describe what you are trying to do. What have you already done? complete set up - include any library() calls and data to reproduce your issue. data for a reprex: Here's a discussion on setting up data for a reprex make it run - include the minimal code required to reproduce your error on the data…

e_heilbron · February 26, 2018, 8:04pm

i can not upload! the reprex database and srcipt are already done, but i can not upload it

e_heilbron · February 28, 2018, 9:52pm

i am unstuck! it was only a warning mensagge. now i have to find out the way to report the results

Max · March 1, 2018, 3:25pm

Great! I'm glad it works. This does demonstrate the need for a reprex; otherwise people might have spent a lot of time trying to fix something that wasn't really an issue.

dchilders · March 1, 2018, 5:34pm

Hi @e_heilbron -

A couple comments about your example:

The fact that you are not specifying the data argument to glm.nb() indicates that the variables (IBDQ32, sexo, etc.) are in your global environment. While this can "work", this approach is error-prone because it is easy to modify some variables and not others (for instance, if you decide to filter your data before modeling).

If you read in data from an external source (for example a csv file or a database), the variables will already be found within a data.frame (and not in the global environment).
The warning message generated from MuMIn::model.sel() indicates the models you are comparing include different observations. The most common reason for this is missing values. Before modeling, it's worthwhile to understand the extent of the missingness in the data. You can do this on a per-variable approach by:

library(MASS)
library(dplyr)

# update 'Eth' field of 'quine' data to have two missing values
quine$Eth[c(3, 5)] <- NA
summarise_all(quine, funs(sum(is.na(.))))
#>   Eth Sex Age Lrn Days
#> 1   2   0   0   0    0

If you use the modified version of quine above for modeling, then the number of observations will differ depending on whether you include Eth as a predictor.

Differing observations is problematic for statistical comparisons (e.g. F-tests, AIC) of multiple models.

There are sophisticated approaches to handling missing values, such as imputation. However, if you have a small number of missing values, you can just omit rows that have any missing predictor before fitting any of the models. (If you want a detailed discussion on modeling with missing values, Frank Harrell's "Regression Modeling Strategies" is an excellent source).

quine2 <- quine %>% filter(!is.na(Eth))
m1 <- glm.nb(Days ~ Sex, data = quine2)
m2 <- glm.nb(Days ~ Sex + Eth, data = quine2)
MuMIn::model.sel(m1, m2)
#> Model selection table 
#>    (Intrc) Sex Eth     family init.theta df   logLik   AICc delta weight
#> m2   2.973   +   + NB(1.1697)       1.17  4 -545.224 1098.7  0.00  0.996
#> m1   2.723   +     NB(1.0663)       1.07  3 -551.752 1109.7 10.94  0.004
#> Abbreviations:
#> family: NB(1.0663) = 'Negative Binomial(1.0663)', 
#>         NB(1.1697) = 'Negative Binomial(1.1697)'
#> Models ranked by AICc(x)