For the following code- how to set the x & y intercept to zero?
model <- lm(cm_cases ~ weeknumber, data = df_linelist_yearweeks)
For the following code- how to set the x & y intercept to zero?
model <- lm(cm_cases ~ weeknumber, data = df_linelist_yearweeks)
Hi @graceahey,
Modify the model formula as in this example:
data(mtcars)
# Model car engine horsepower as function of cylinder displacement (size)
mod1 <- lm(hp ~ disp, data=mtcars)
mod2 <- lm(hp ~ disp - 1, data=mtcars) # Exclude intercept estimation (set to zero)
mod3 <- lm(hp ~ 0 + disp, data=mtcars) # Model formula here has same effect as in mod2
summary(mod1)
#>
#> Call:
#> lm(formula = hp ~ disp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -48.623 -28.378 -6.558 13.588 157.562
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 45.7345 16.1289 2.836 0.00811 **
#> disp 0.4375 0.0618 7.080 7.14e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 42.65 on 30 degrees of freedom
#> Multiple R-squared: 0.6256, Adjusted R-squared: 0.6131
#> F-statistic: 50.13 on 1 and 30 DF, p-value: 7.143e-08
summary(mod2)
#>
#> Call:
#> lm(formula = hp ~ disp - 1, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -74.65 -28.76 16.60 26.64 156.67
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> disp 0.5925 0.0320 18.52 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 47.24 on 31 degrees of freedom
#> Multiple R-squared: 0.9171, Adjusted R-squared: 0.9144
#> F-statistic: 342.8 on 1 and 31 DF, p-value: < 2.2e-16
summary(mod3)
#>
#> Call:
#> lm(formula = hp ~ 0 + disp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -74.65 -28.76 16.60 26.64 156.67
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> disp 0.5925 0.0320 18.52 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 47.24 on 31 degrees of freedom
#> Multiple R-squared: 0.9171, Adjusted R-squared: 0.9144
#> F-statistic: 342.8 on 1 and 31 DF, p-value: < 2.2e-16
plot(hp ~ disp, data=mtcars,
xlim=c(0,500),
ylim=c(0,300))
abline(mod1)
abline(mod2, lty=2)
Created on 2021-04-06 by the reprex package (v2.0.0)
You may notice that r-squared is not the same in the original model as in the two subsequent versions. The solution simply must pass through (0,0) instead of (\overline{x},\overline{y}). Among other things, the reason r-squared is different is that the total sum of squares in y is not \sum (y_{i} - \overline{y})^2 but is \sum y^2 in the last two versions. This is not the same model.
The classic invariant solution to this problem is to center all the variables; notice these are identical in every way except for the y-intercept being zero to near computer precision [5.088e-16].
For cars:
> summary(lm(mpg~wt+disp, data=mtcars))
Call:
lm(formula = mpg ~ wt + disp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4087 -2.3243 -0.7683 1.7721 6.3484
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.96055 2.16454 16.151 4.91e-16 ***
wt -3.35082 1.16413 -2.878 0.00743 **
disp -0.01773 0.00919 -1.929 0.06362 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.917 on 29 degrees of freedom
Multiple R-squared: 0.7809, Adjusted R-squared: 0.7658
F-statistic: 51.69 on 2 and 29 DF, p-value: 2.744e-10
> summary(lm(scale(mpg, scale=FALSE)~scale(wt, scale=FALSE)+scale(disp, scale=FALSE), data=mtcars))
Call:
lm(formula = scale(mpg, scale = FALSE) ~ scale(wt, scale = FALSE) +
scale(disp, scale = FALSE), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4087 -2.3243 -0.7683 1.7721 6.3484
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.088e-16 5.156e-01 0.000 1.00000
scale(wt, scale = FALSE) -3.351e+00 1.164e+00 -2.878 0.00743 **
scale(disp, scale = FALSE) -1.772e-02 9.190e-03 -1.929 0.06362 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.917 on 29 degrees of freedom
Multiple R-squared: 0.7809, Adjusted R-squared: 0.7658
F-statistic: 51.69 on 2 and 29 DF, p-value: 2.744e-10
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.