Hi @e_heilbron -
A couple comments about your example:
-
The fact that you are not specifying the data argument to glm.nb() indicates that the variables (IBDQ32, sexo, etc.) are in your global environment. While this can "work", this approach is error-prone because it is easy to modify some variables and not others (for instance, if you decide to filter your data before modeling).
If you read in data from an external source (for example a csv file or a database), the variables will already be found within a data.frame (and not in the global environment).
-
The warning message generated from MuMIn::model.sel() indicates the models you are comparing include different observations. The most common reason for this is missing values. Before modeling, it's worthwhile to understand the extent of the missingness in the data. You can do this on a per-variable approach by:
library(MASS)
library(dplyr)
# update 'Eth' field of 'quine' data to have two missing values
quine$Eth[c(3, 5)] <- NA
summarise_all(quine, funs(sum(is.na(.))))
#> Eth Sex Age Lrn Days
#> 1 2 0 0 0 0
If you use the modified version of quine above for modeling, then the number of observations will differ depending on whether you include Eth as a predictor.
Differing observations is problematic for statistical comparisons (e.g. F-tests, AIC) of multiple models.
There are sophisticated approaches to handling missing values, such as imputation. However, if you have a small number of missing values, you can just omit rows that have any missing predictor before fitting any of the models. (If you want a detailed discussion on modeling with missing values, Frank Harrell's "Regression Modeling Strategies" is an excellent source).
quine2 <- quine %>% filter(!is.na(Eth))
m1 <- glm.nb(Days ~ Sex, data = quine2)
m2 <- glm.nb(Days ~ Sex + Eth, data = quine2)
MuMIn::model.sel(m1, m2)
#> Model selection table
#> (Intrc) Sex Eth family init.theta df logLik AICc delta weight
#> m2 2.973 + + NB(1.1697) 1.17 4 -545.224 1098.7 0.00 0.996
#> m1 2.723 + NB(1.0663) 1.07 3 -551.752 1109.7 10.94 0.004
#> Abbreviations:
#> family: NB(1.0663) = 'Negative Binomial(1.0663)',
#> NB(1.1697) = 'Negative Binomial(1.1697)'
#> Models ranked by AICc(x)