Zero-inflated negative binomial regression

PalomaLlorente · December 30, 2019, 9:24am

Hi community!

I am working with zero-inflated negative binomial regression and I am having an error message when trying to apply the regression. The error message is: << Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34 >>
I think it is related to matrix size or that it is not invertible, but I do not know how to solve it... I would like to attach the real code but it is so big to create a reprex (about 50000 rows), however, I have created an example one:

library("pscl")
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis
analysis_example<-data.frame(
          CM = c(0, 0, 0, 0, 2, 4, 1, 2, 2, 12, 6, 2),
       Heavy = c(2.08333333333333, 5.53191489361702, 2.99625468164794,
                 2.75862068965517, 10.2040816326531, 0.390625,
                 0.613496932515337, 7.14285714285714, 1.63934426229508, 0.609756097560976,
                 1.06382978723404, 1.16959064327485)
)
CM.zinb <- zeroinfl(CM ~ Heavy | Heavy, data = analysis_example, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#>  0  1  2  3  4  5  6  7  8  9 10 11 12 
#>  4  1  4  0  1  0  1  0  0  0  0  0  1 
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial  value 26.057212 
#> iter  10 value 24.632059
#> final  value 24.631828 
#> converged

However this reprex is not representative as it has converged...real one not: << Zero-inflated Count Model
count model: negbin with log link zero-inflation model: binomial with logit link dependent variable:
0 1 2 3 4 5 6 7 8 9 10 11 12
41485 113 56 8 12 1 1 1 0 0 0 0 1
generating starting values...done
calling optim() for ML estimation:
initial value 2591.569217
iter 10 value 1657.661886
iter 20 value 1439.561545
final value 1439.397026
converged
Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34>>

Thanks in advance for your help

FJCC · December 30, 2019, 1:24pm

Can you please show the information returned by using the summary() function on your data frame and also the actual call to zeroinfl() that you are using?

PalomaLlorente · December 30, 2019, 2:59pm

Yes, here it goes:

summary(datos)
#>       Flt                FL             Time              Mor        
#>  Min.   :   1.00   Min.   :  4.0   Min.   :   1.00   Min.   :  0.00  
#>  1st Qu.:   1.00   1st Qu.:182.0   1st Qu.:  33.61   1st Qu.: 23.44  
#>  Median :   1.00   Median :277.0   Median :  69.00   Median : 66.67  
#>  Mean   :  21.23   Mean   :258.7   Mean   : 112.57   Mean   : 60.72  
#>  3rd Qu.:  12.00   3rd Qu.:344.0   3rd Qu.: 147.00   3rd Qu.:100.00  
#>  Max.   :1705.00   Max.   :515.0   Max.   :2164.00   Max.   :100.00  
#>       Aft              Ngt              Lng                C.M           
#>  Min.   :  0.00   Min.   :  0.00   Min.   :  0.3353   Min.   : 0.000000  
#>  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  3.2730   1st Qu.: 0.000000  
#>  Median :  0.00   Median :  0.00   Median :  7.5335   Median : 0.000000  
#>  Mean   : 24.05   Mean   : 15.23   Mean   : 12.4277   Mean   : 0.007846  
#>  3rd Qu.: 40.00   3rd Qu.: 17.86   3rd Qu.: 16.2491   3rd Qu.: 0.000000  
#>  Max.   :100.00   Max.   :100.00   Max.   :271.0034   Max.   :12.000000  
#>       Tr.C              C.E                Zona             Hev         
#>  Min.   :   1.00   Min.   :0.000000   Min.   : 1.000   Min.   :  0.000  
#>  1st Qu.:   1.00   1st Qu.:0.000000   1st Qu.: 2.000   1st Qu.:  0.000  
#>  Median :   1.00   Median :0.000000   Median : 4.000   Median :  0.000  
#>  Mean   :  21.23   Mean   :0.003455   Mean   : 4.698   Mean   :  6.439  
#>  3rd Qu.:  12.00   3rd Qu.:0.000000   3rd Qu.: 5.000   3rd Qu.:  0.000  
#>  Max.   :1705.00   Max.   :2.000000   Max.   :28.000   Max.   :100.000  
#>       Mdm              Lgt               L_M                 Asc        
#>  Min.   :  0.00   Min.   :  0.000   Min.   :  0.00000   Min.   :  0.00  
#>  1st Qu.: 87.50   1st Qu.:  0.000   1st Qu.:  0.00000   1st Qu.:  0.00  
#>  Median :100.00   Median :  0.000   Median :  0.00000   Median :  0.00  
#>  Mean   : 84.39   Mean   :  9.087   Mean   :  0.08117   Mean   : 20.24  
#>  3rd Qu.:100.00   3rd Qu.:  0.000   3rd Qu.:  0.00000   3rd Qu.: 22.22  
#>  Max.   :100.00   Max.   :100.000   Max.   :100.00000   Max.   :100.00  
#>       Des              Crs        
#>  Min.   :  0.00   Min.   :  0.00  
#>  1st Qu.:  0.00   1st Qu.:  0.00  
#>  Median :  0.00   Median : 50.00  
#>  Mean   : 28.22   Mean   : 51.54  
#>  3rd Qu.: 60.55   3rd Qu.:100.00  
#>  Max.   :100.00   Max.   :100.00

That is the data. And this is the zeroinfl() I use:

Num_flights<-datos$Flt
FL<-datos$FL
Tiempo_Medio<-datos$Time
Num_Ascenso<-datos$Asc
Num_Descenso<-datos$Des
Num_Crucero<-datos$Crs
Num_Manana<-datos$Mor
Num_Tarde<-datos$Aft
Num_Noche<-datos$Ngt
Long<-datos$Lng
Trafico_Enfrentado<-datos$Tr.C
Num_Conf_MISMO<-datos$C.M
Num_Conf_ENTRE<-datos$C.E
Zona<-datos$Zona
Heavy<-datos$Hev
Medium<-datos$Mdm
Light<-datos$Lgt
Light_Medium<-datos$L_M

CM2.zinb <- zeroinfl(Num_Conf_MISMO ~ Heavy | Heavy, data = datos, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#>     0     1     2     3     4     5     6     7     8     9    10    11 
#> 41485   113    56     8    12     1     1     1     0     0     0     0 
#>    12 
#>     1 
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial  value 2591.569217 
#> iter  10 value 1657.661886
#> iter  20 value 1439.561545
#> final  value 1439.397026 
#> converged
#> Error in solve.default(as.matrix(fit$hessian)): sistema es computacionalmente singular: número de condición recíproco = 6.75794e-34

The reason I am only using 1 variable for the zero inflated regression is that I need to know what variables are relevant for my study. For that, first I have calculated all the possible models with each variable (that is to say 16 zeroinfl() ) and then, I calculate the AIC of the regressions; the minimum model AIC would be the first variable to be used in the final regression. After that, the process would be repeated incorporating one by one the variables. I read this way of seeking the relevant variables in the article: "Tutorial on Using Regression Models with Count Outcomes using R; A. Alexander Beaujean, Grant B. Morgan, Baylor University" . However process is so tedious and I don´t know if there is another easier way....

Thanks again for your help!

FJCC · December 30, 2019, 3:46pm

I do not have experience with zero-inflated models, so take my advice cautiously.
Try making a vector that designates whether Heavy is greater than zero

HeavyNotZero <- Heavy > 0

and make a table of that and Num_Conf_MISMO

table(Num_Conf_MISMO, HeavyNotZero)

If many of the Num_Conf_MISMO values greater than zero occur only when Heavy is zero then I would expect the fit to not be useful. It would be a sign that Heavy does not tell you much about Num_Conf_MISMO

system · January 20, 2020, 3:46pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.