Stranger things

rstudio

#1

the next is a copy of the scip i am working on,as you can see, i try to compare models, i fit them with the same s variables, and i get this error mensagge, why?

mod3<-glm.nb(IBDQ32~sexo+edad+audit)
mod4<-glm.nb(IBDQ32~sexo+edad+audit+caa)
mod5<-glm.nb(IBDQ32~sexo+edad+audit+caa+imc)
mod6<-glm.nb(IBDQ32~sexo+edad+imc)

library(MuMIn)

# Warning message:
# package ‘MuMIn’ was built under R version 3.4.3 

ModelSel.NB<-model.sel(mod3,mod4,mod5,mod6, rank="AICc")

# Warning message:
# In model.sel.default(mod3, mod4, mod5, mod6, rank = "AICc") :
#   models are not all fitted to the same data

#2

First, I think you are getting a warning, not an error. Nevertheless, you are probably not getting the result you want.

But more importantly, in your code snippet you don’t specify data argument when you fit the model. Are you sure you are not failing at this stage? Where are you taking glm.nb from? In all model fitting function I’ve came across data is a compulsory argument, so I would imagine your error happens there, not at model selection stage.


#3

Or/additionally, you may have missing values in your data. For instance, if caa has 3 missing values, the number observations (the data) will be different between mod3 and mod4.


#4

but there are variables of the same sheet of data


#5

the arguments of the models were properly defined, sorry for not post it


#6

Next step is for you to create a reprex, so it is easier to see if there are any issues.


#7

i can not upload! the reprex database and srcipt are already done, but i can not upload it


#8

i am unstuck! it was only a warning mensagge. now i have to find out the way to report the results


#9

Great! I’m glad it works. This does demonstrate the need for a reprex; otherwise people might have spent a lot of time trying to fix something that wasn’t really an issue.


#10

Hi @e_heilbron -

A couple comments about your example:

  1. The fact that you are not specifying the data argument to glm.nb() indicates that the variables (IBDQ32, sexo, etc.) are in your global environment. While this can “work”, this approach is error-prone because it is easy to modify some variables and not others (for instance, if you decide to filter your data before modeling).

    If you read in data from an external source (for example a csv file or a database), the variables will already be found within a data.frame (and not in the global environment).

  2. The warning message generated from MuMIn::model.sel() indicates the models you are comparing include different observations. The most common reason for this is missing values. Before modeling, it’s worthwhile to understand the extent of the missingness in the data. You can do this on a per-variable approach by:

library(MASS)
library(dplyr)

# update 'Eth' field of 'quine' data to have two missing values
quine$Eth[c(3, 5)] <- NA
summarise_all(quine, funs(sum(is.na(.))))
#>   Eth Sex Age Lrn Days
#> 1   2   0   0   0    0

If you use the modified version of quine above for modeling, then the number of observations will differ depending on whether you include Eth as a predictor.

Differing observations is problematic for statistical comparisons (e.g. F-tests, AIC) of multiple models.

There are sophisticated approaches to handling missing values, such as imputation. However, if you have a small number of missing values, you can just omit rows that have any missing predictor before fitting any of the models. (If you want a detailed discussion on modeling with missing values, Frank Harrell’s “Regression Modeling Strategies” is an excellent source).

quine2 <- quine %>% filter(!is.na(Eth))
m1 <- glm.nb(Days ~ Sex, data = quine2)
m2 <- glm.nb(Days ~ Sex + Eth, data = quine2)
MuMIn::model.sel(m1, m2)
#> Model selection table 
#>    (Intrc) Sex Eth     family init.theta df   logLik   AICc delta weight
#> m2   2.973   +   + NB(1.1697)       1.17  4 -545.224 1098.7  0.00  0.996
#> m1   2.723   +     NB(1.0663)       1.07  3 -551.752 1109.7 10.94  0.004
#> Abbreviations:
#> family: NB(1.0663) = 'Negative Binomial(1.0663)', 
#>         NB(1.1697) = 'Negative Binomial(1.1697)'
#> Models ranked by AICc(x)