I am working with zero-inflated negative binomial regression and I am having an error message when trying to apply the regression. The error message is: << Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34 >>
I think it is related to matrix size or that it is not invertible, but I do not know how to solve it... I would like to attach the real code but it is so big to create a reprex (about 50000 rows), however, I have created an example one:
library("pscl")
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis
analysis_example<-data.frame(
CM = c(0, 0, 0, 0, 2, 4, 1, 2, 2, 12, 6, 2),
Heavy = c(2.08333333333333, 5.53191489361702, 2.99625468164794,
2.75862068965517, 10.2040816326531, 0.390625,
0.613496932515337, 7.14285714285714, 1.63934426229508, 0.609756097560976,
1.06382978723404, 1.16959064327485)
)
CM.zinb <- zeroinfl(CM ~ Heavy | Heavy, data = analysis_example, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#> 0 1 2 3 4 5 6 7 8 9 10 11 12
#> 4 1 4 0 1 0 1 0 0 0 0 0 1
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial value 26.057212
#> iter 10 value 24.632059
#> final value 24.631828
#> converged
However this reprex is not representative as it has converged...real one not: << Zero-inflated Count Model
count model: negbin with log link zero-inflation model: binomial with logit link dependent variable:
0 1 2 3 4 5 6 7 8 9 10 11 12
41485 113 56 8 12 1 1 1 0 0 0 0 1
generating starting values...done
calling optim() for ML estimation:
initial value 2591.569217
iter 10 value 1657.661886
iter 20 value 1439.561545
final value 1439.397026
converged
Error in solve.default(as.matrix(fit$hessian)) :
system is computationally singular: reciprocal condition number = 6.75794e-34>>
Can you please show the information returned by using the summary() function on your data frame and also the actual call to zeroinfl() that you are using?
summary(datos)
#> Flt FL Time Mor
#> Min. : 1.00 Min. : 4.0 Min. : 1.00 Min. : 0.00
#> 1st Qu.: 1.00 1st Qu.:182.0 1st Qu.: 33.61 1st Qu.: 23.44
#> Median : 1.00 Median :277.0 Median : 69.00 Median : 66.67
#> Mean : 21.23 Mean :258.7 Mean : 112.57 Mean : 60.72
#> 3rd Qu.: 12.00 3rd Qu.:344.0 3rd Qu.: 147.00 3rd Qu.:100.00
#> Max. :1705.00 Max. :515.0 Max. :2164.00 Max. :100.00
#> Aft Ngt Lng C.M
#> Min. : 0.00 Min. : 0.00 Min. : 0.3353 Min. : 0.000000
#> 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 3.2730 1st Qu.: 0.000000
#> Median : 0.00 Median : 0.00 Median : 7.5335 Median : 0.000000
#> Mean : 24.05 Mean : 15.23 Mean : 12.4277 Mean : 0.007846
#> 3rd Qu.: 40.00 3rd Qu.: 17.86 3rd Qu.: 16.2491 3rd Qu.: 0.000000
#> Max. :100.00 Max. :100.00 Max. :271.0034 Max. :12.000000
#> Tr.C C.E Zona Hev
#> Min. : 1.00 Min. :0.000000 Min. : 1.000 Min. : 0.000
#> 1st Qu.: 1.00 1st Qu.:0.000000 1st Qu.: 2.000 1st Qu.: 0.000
#> Median : 1.00 Median :0.000000 Median : 4.000 Median : 0.000
#> Mean : 21.23 Mean :0.003455 Mean : 4.698 Mean : 6.439
#> 3rd Qu.: 12.00 3rd Qu.:0.000000 3rd Qu.: 5.000 3rd Qu.: 0.000
#> Max. :1705.00 Max. :2.000000 Max. :28.000 Max. :100.000
#> Mdm Lgt L_M Asc
#> Min. : 0.00 Min. : 0.000 Min. : 0.00000 Min. : 0.00
#> 1st Qu.: 87.50 1st Qu.: 0.000 1st Qu.: 0.00000 1st Qu.: 0.00
#> Median :100.00 Median : 0.000 Median : 0.00000 Median : 0.00
#> Mean : 84.39 Mean : 9.087 Mean : 0.08117 Mean : 20.24
#> 3rd Qu.:100.00 3rd Qu.: 0.000 3rd Qu.: 0.00000 3rd Qu.: 22.22
#> Max. :100.00 Max. :100.000 Max. :100.00000 Max. :100.00
#> Des Crs
#> Min. : 0.00 Min. : 0.00
#> 1st Qu.: 0.00 1st Qu.: 0.00
#> Median : 0.00 Median : 50.00
#> Mean : 28.22 Mean : 51.54
#> 3rd Qu.: 60.55 3rd Qu.:100.00
#> Max. :100.00 Max. :100.00
That is the data. And this is the zeroinfl() I use:
Num_flights<-datos$Flt
FL<-datos$FL
Tiempo_Medio<-datos$Time
Num_Ascenso<-datos$Asc
Num_Descenso<-datos$Des
Num_Crucero<-datos$Crs
Num_Manana<-datos$Mor
Num_Tarde<-datos$Aft
Num_Noche<-datos$Ngt
Long<-datos$Lng
Trafico_Enfrentado<-datos$Tr.C
Num_Conf_MISMO<-datos$C.M
Num_Conf_ENTRE<-datos$C.E
Zona<-datos$Zona
Heavy<-datos$Hev
Medium<-datos$Mdm
Light<-datos$Lgt
Light_Medium<-datos$L_M
CM2.zinb <- zeroinfl(Num_Conf_MISMO ~ Heavy | Heavy, data = datos, link = "logit", dist = "negbin", trace = TRUE, EM = FALSE)
#> Zero-inflated Count Model
#> count model: negbin with log link
#> zero-inflation model: binomial with logit link
#> dependent variable:
#> 0 1 2 3 4 5 6 7 8 9 10 11
#> 41485 113 56 8 12 1 1 1 0 0 0 0
#> 12
#> 1
#> generating starting values...done
#> calling optim() for ML estimation:
#> initial value 2591.569217
#> iter 10 value 1657.661886
#> iter 20 value 1439.561545
#> final value 1439.397026
#> converged
#> Error in solve.default(as.matrix(fit$hessian)): sistema es computacionalmente singular: número de condición recíproco = 6.75794e-34
The reason I am only using 1 variable for the zero inflated regression is that I need to know what variables are relevant for my study. For that, first I have calculated all the possible models with each variable (that is to say 16 zeroinfl() ) and then, I calculate the AIC of the regressions; the minimum model AIC would be the first variable to be used in the final regression. After that, the process would be repeated incorporating one by one the variables. I read this way of seeking the relevant variables in the article: "Tutorial on Using Regression Models with Count Outcomes using R; A. Alexander Beaujean, Grant B. Morgan, Baylor University" . However process is so tedious and I don´t know if there is another easier way....
I do not have experience with zero-inflated models, so take my advice cautiously.
Try making a vector that designates whether Heavy is greater than zero
HeavyNotZero <- Heavy > 0
and make a table of that and Num_Conf_MISMO
table(Num_Conf_MISMO, HeavyNotZero)
If many of the Num_Conf_MISMO values greater than zero occur only when Heavy is zero then I would expect the fit to not be useful. It would be a sign that Heavy does not tell you much about Num_Conf_MISMO