I'm building a regression model. In my model, I have 38 variables and all are continuous. When I run the model, I see "2 not defined because of singularities". Can someone please help me resolve this?
Thank you so much.
I'm building a regression model. In my model, I have 38 variables and all are continuous. When I run the model, I see "2 not defined because of singularities". Can someone please help me resolve this?
Thank you so much.
This means that 2 of your covariates are a linear combination of some other of your covariates and the only thing to do to fix this is remove the variables that are causing the singularity as there is no need to include them. Here's a small example to show how this happens. We would remove x4 and re-run the model
library(tidyverse)
dat <- tibble(x1=rnorm(200)) %>%
mutate(
x2=rnorm(200),
x3=rnorm(200),
x4=x1+2*x2+3*x3,
noise = rnorm(200, 0, .1),
y=5*x1+2*x2+3*x3+x4 + noise,
yalt1=5*x1+2*x2+3*x3+(x1+2*x2+3*x3) + noise,
yalt2=6*x1+4*x2+6*x3 + noise
)
# note that y, yalt1, and yalt2 are all the same -
# you don't need x4 since it is a linear combo of the others
summary(dat)
#> x1 x2 x3 x4
#> Min. :-2.97021 Min. :-2.35736 Min. :-2.52121 Min. :-8.8831
#> 1st Qu.:-0.58417 1st Qu.:-0.67055 1st Qu.:-0.64257 1st Qu.:-2.5376
#> Median : 0.20379 Median :-0.07776 Median :-0.02061 Median : 0.2286
#> Mean : 0.09327 Mean :-0.07036 Mean : 0.05496 Mean : 0.1174
#> 3rd Qu.: 0.78404 3rd Qu.: 0.56112 3rd Qu.: 0.70487 3rd Qu.: 3.0105
#> Max. : 2.54535 Max. : 2.57158 Max. : 2.94372 Max. : 8.8523
#> noise y yalt1 yalt2
#> Min. :-0.279895 Min. :-26.8173 Min. :-26.8173 Min. :-26.8173
#> 1st Qu.:-0.066148 1st Qu.: -6.3419 1st Qu.: -6.3419 1st Qu.: -6.3419
#> Median : 0.008035 Median : 0.7155 Median : 0.7155 Median : 0.7155
#> Mean : 0.002381 Mean : 0.6103 Mean : 0.6103 Mean : 0.6103
#> 3rd Qu.: 0.069915 3rd Qu.: 7.7642 3rd Qu.: 7.7642 3rd Qu.: 7.7642
#> Max. : 0.223219 Max. : 26.2079 Max. : 26.2079 Max. : 26.2079
m1 <- lm(y~x1+x2+x3+x4, data=dat)
summary(m1)
#>
#> Call:
#> lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.270296 -0.064250 -0.001033 0.069314 0.213029
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.001700 0.006674 0.255 0.799
#> x1 5.994709 0.006559 913.920 <2e-16 ***
#> x2 3.988100 0.007030 567.328 <2e-16 ***
#> x3 6.006129 0.006810 881.946 <2e-16 ***
#> x4 NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.09364 on 196 degrees of freedom
#> Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
#> F-statistic: 6.581e+05 on 3 and 196 DF, p-value: < 2.2e-16
Created on 2021-04-24 by the reprex package (v2.0.0)
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.