Zero-inflated negative binomial regression

Hi community!

I am working with zero-inflated negative binomial regression and I am having an error message when trying to apply the regression. The error message is: << Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34 >>
I think it is related to matrix size or that it is not invertible, but I do not know how to solve it... I would like to attach the real code but it is so big to create a reprex (about 50000 rows), however, I have created an example one:

library("pscl")
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis
analysis_example<-data.frame(
          CM = c(0, 0, 0, 0, 2, 4, 1, 2, 2, 12, 6, 2),
       Heavy = c(2.08333333333333, 5.53191489361702, 2.99625468164794,
                 2.75862068965517, 10.2040816326531, 0.390625,
                 0.613496932515337, 7.14285714285714, 1.63934426229508, 0.609756097560976,
                 1.06382978723404, 1.16959064327485)
)
CM.zinb <- zeroinfl(CM ~ Heavy | Heavy, data = analysis_example, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#>  0  1  2  3  4  5  6  7  8  9 10 11 12 
#>  4  1  4  0  1  0  1  0  0  0  0  0  1 
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial  value 26.057212 
#> iter  10 value 24.632059
#> final  value 24.631828 
#> converged

However this reprex is not representative as it has converged...real one not: << Zero-inflated Count Model
count model: negbin with log link zero-inflation model: binomial with logit link dependent variable:
0 1 2 3 4 5 6 7 8 9 10 11 12
41485 113 56 8 12 1 1 1 0 0 0 0 1
generating starting values...done
calling optim() for ML estimation:
initial value 2591.569217
iter 10 value 1657.661886
iter 20 value 1439.561545
final value 1439.397026
converged
Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34>>

Thanks in advance for your help

Can you please show the information returned by using the summary() function on your data frame and also the actual call to zeroinfl() that you are using?

Yes, here it goes:

summary(datos)
#>       Flt                FL             Time              Mor        
#>  Min.   :   1.00   Min.   :  4.0   Min.   :   1.00   Min.   :  0.00  
#>  1st Qu.:   1.00   1st Qu.:182.0   1st Qu.:  33.61   1st Qu.: 23.44  
#>  Median :   1.00   Median :277.0   Median :  69.00   Median : 66.67  
#>  Mean   :  21.23   Mean   :258.7   Mean   : 112.57   Mean   : 60.72  
#>  3rd Qu.:  12.00   3rd Qu.:344.0   3rd Qu.: 147.00   3rd Qu.:100.00  
#>  Max.   :1705.00   Max.   :515.0   Max.   :2164.00   Max.   :100.00  
#>       Aft              Ngt              Lng                C.M           
#>  Min.   :  0.00   Min.   :  0.00   Min.   :  0.3353   Min.   : 0.000000  
#>  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  3.2730   1st Qu.: 0.000000  
#>  Median :  0.00   Median :  0.00   Median :  7.5335   Median : 0.000000  
#>  Mean   : 24.05   Mean   : 15.23   Mean   : 12.4277   Mean   : 0.007846  
#>  3rd Qu.: 40.00   3rd Qu.: 17.86   3rd Qu.: 16.2491   3rd Qu.: 0.000000  
#>  Max.   :100.00   Max.   :100.00   Max.   :271.0034   Max.   :12.000000  
#>       Tr.C              C.E                Zona             Hev         
#>  Min.   :   1.00   Min.   :0.000000   Min.   : 1.000   Min.   :  0.000  
#>  1st Qu.:   1.00   1st Qu.:0.000000   1st Qu.: 2.000   1st Qu.:  0.000  
#>  Median :   1.00   Median :0.000000   Median : 4.000   Median :  0.000  
#>  Mean   :  21.23   Mean   :0.003455   Mean   : 4.698   Mean   :  6.439  
#>  3rd Qu.:  12.00   3rd Qu.:0.000000   3rd Qu.: 5.000   3rd Qu.:  0.000  
#>  Max.   :1705.00   Max.   :2.000000   Max.   :28.000   Max.   :100.000  
#>       Mdm              Lgt               L_M                 Asc        
#>  Min.   :  0.00   Min.   :  0.000   Min.   :  0.00000   Min.   :  0.00  
#>  1st Qu.: 87.50   1st Qu.:  0.000   1st Qu.:  0.00000   1st Qu.:  0.00  
#>  Median :100.00   Median :  0.000   Median :  0.00000   Median :  0.00  
#>  Mean   : 84.39   Mean   :  9.087   Mean   :  0.08117   Mean   : 20.24  
#>  3rd Qu.:100.00   3rd Qu.:  0.000   3rd Qu.:  0.00000   3rd Qu.: 22.22  
#>  Max.   :100.00   Max.   :100.000   Max.   :100.00000   Max.   :100.00  
#>       Des              Crs        
#>  Min.   :  0.00   Min.   :  0.00  
#>  1st Qu.:  0.00   1st Qu.:  0.00  
#>  Median :  0.00   Median : 50.00  
#>  Mean   : 28.22   Mean   : 51.54  
#>  3rd Qu.: 60.55   3rd Qu.:100.00  
#>  Max.   :100.00   Max.   :100.00

That is the data. And this is the zeroinfl() I use:

Num_flights<-datos$Flt
FL<-datos$FL
Tiempo_Medio<-datos$Time
Num_Ascenso<-datos$Asc
Num_Descenso<-datos$Des
Num_Crucero<-datos$Crs
Num_Manana<-datos$Mor
Num_Tarde<-datos$Aft
Num_Noche<-datos$Ngt
Long<-datos$Lng
Trafico_Enfrentado<-datos$Tr.C
Num_Conf_MISMO<-datos$C.M
Num_Conf_ENTRE<-datos$C.E
Zona<-datos$Zona
Heavy<-datos$Hev
Medium<-datos$Mdm
Light<-datos$Lgt
Light_Medium<-datos$L_M

CM2.zinb <- zeroinfl(Num_Conf_MISMO ~ Heavy | Heavy, data = datos, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#>     0     1     2     3     4     5     6     7     8     9    10    11 
#> 41485   113    56     8    12     1     1     1     0     0     0     0 
#>    12 
#>     1 
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial  value 2591.569217 
#> iter  10 value 1657.661886
#> iter  20 value 1439.561545
#> final  value 1439.397026 
#> converged
#> Error in solve.default(as.matrix(fit$hessian)): sistema es computacionalmente singular: número de condición recíproco = 6.75794e-34

The reason I am only using 1 variable for the zero inflated regression is that I need to know what variables are relevant for my study. For that, first I have calculated all the possible models with each variable (that is to say 16 zeroinfl() ) and then, I calculate the AIC of the regressions; the minimum model AIC would be the first variable to be used in the final regression. After that, the process would be repeated incorporating one by one the variables. I read this way of seeking the relevant variables in the article: "Tutorial on Using Regression Models with Count Outcomes using R; A. Alexander Beaujean, Grant B. Morgan, Baylor University" . However process is so tedious and I don´t know if there is another easier way....

Thanks again for your help!

I do not have experience with zero-inflated models, so take my advice cautiously.
Try making a vector that designates whether Heavy is greater than zero

HeavyNotZero <- Heavy > 0

and make a table of that and Num_Conf_MISMO

table(Num_Conf_MISMO, HeavyNotZero)

If many of the Num_Conf_MISMO values greater than zero occur only when Heavy is zero then I would expect the fit to not be useful. It would be a sign that Heavy does not tell you much about Num_Conf_MISMO

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.