Coefficients: (2 not defined because of singularities) - Linear Regression Model

I'm building a regression model. In my model, I have 38 variables and all are continuous. When I run the model, I see "2 not defined because of singularities". Can someone please help me resolve this?

Thank you so much.

This means that 2 of your covariates are a linear combination of some other of your covariates and the only thing to do to fix this is remove the variables that are causing the singularity as there is no need to include them. Here's a small example to show how this happens. We would remove x4 and re-run the model

library(tidyverse)

dat <- tibble(x1=rnorm(200)) %>%
   mutate(
      x2=rnorm(200),
      x3=rnorm(200),
      x4=x1+2*x2+3*x3,
      noise = rnorm(200, 0, .1),
      y=5*x1+2*x2+3*x3+x4 + noise,
      yalt1=5*x1+2*x2+3*x3+(x1+2*x2+3*x3) + noise,
      yalt2=6*x1+4*x2+6*x3 + noise
   )
# note that y, yalt1, and yalt2 are all the same -
# you don't need x4 since it is a linear combo of the others
summary(dat) 
#>        x1                 x2                 x3                 x4         
#>  Min.   :-2.97021   Min.   :-2.35736   Min.   :-2.52121   Min.   :-8.8831  
#>  1st Qu.:-0.58417   1st Qu.:-0.67055   1st Qu.:-0.64257   1st Qu.:-2.5376  
#>  Median : 0.20379   Median :-0.07776   Median :-0.02061   Median : 0.2286  
#>  Mean   : 0.09327   Mean   :-0.07036   Mean   : 0.05496   Mean   : 0.1174  
#>  3rd Qu.: 0.78404   3rd Qu.: 0.56112   3rd Qu.: 0.70487   3rd Qu.: 3.0105  
#>  Max.   : 2.54535   Max.   : 2.57158   Max.   : 2.94372   Max.   : 8.8523  
#>      noise                 y                yalt1              yalt2         
#>  Min.   :-0.279895   Min.   :-26.8173   Min.   :-26.8173   Min.   :-26.8173  
#>  1st Qu.:-0.066148   1st Qu.: -6.3419   1st Qu.: -6.3419   1st Qu.: -6.3419  
#>  Median : 0.008035   Median :  0.7155   Median :  0.7155   Median :  0.7155  
#>  Mean   : 0.002381   Mean   :  0.6103   Mean   :  0.6103   Mean   :  0.6103  
#>  3rd Qu.: 0.069915   3rd Qu.:  7.7642   3rd Qu.:  7.7642   3rd Qu.:  7.7642  
#>  Max.   : 0.223219   Max.   : 26.2079   Max.   : 26.2079   Max.   : 26.2079
   
m1 <- lm(y~x1+x2+x3+x4, data=dat)
summary(m1)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.270296 -0.064250 -0.001033  0.069314  0.213029 
#> 
#> Coefficients: (1 not defined because of singularities)
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.001700   0.006674   0.255    0.799    
#> x1          5.994709   0.006559 913.920   <2e-16 ***
#> x2          3.988100   0.007030 567.328   <2e-16 ***
#> x3          6.006129   0.006810 881.946   <2e-16 ***
#> x4                NA         NA      NA       NA    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.09364 on 196 degrees of freedom
#> Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
#> F-statistic: 6.581e+05 on 3 and 196 DF,  p-value: < 2.2e-16

Created on 2021-04-24 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.