 # Set x & y intercept to zero

For the following code- how to set the x & y intercept to zero?

model <- lm(cm_cases ~ weeknumber, data = df_linelist_yearweeks)

Hi @graceahey,
Modify the model formula as in this example:

data(mtcars)

# Model car engine horsepower as function of cylinder displacement (size)
mod1 <- lm(hp ~ disp, data=mtcars)
mod2 <- lm(hp ~ disp - 1, data=mtcars) # Exclude intercept estimation (set to zero)
mod3 <- lm(hp ~ 0 + disp, data=mtcars) # Model formula here has same effect as in mod2

summary(mod1)
#>
#> Call:
#> lm(formula = hp ~ disp, data = mtcars)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -48.623 -28.378  -6.558  13.588 157.562
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  45.7345    16.1289   2.836  0.00811 **
#> disp          0.4375     0.0618   7.080 7.14e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 42.65 on 30 degrees of freedom
#> Multiple R-squared:  0.6256, Adjusted R-squared:  0.6131
#> F-statistic: 50.13 on 1 and 30 DF,  p-value: 7.143e-08
summary(mod2)
#>
#> Call:
#> lm(formula = hp ~ disp - 1, data = mtcars)
#>
#> Residuals:
#>    Min     1Q Median     3Q    Max
#> -74.65 -28.76  16.60  26.64 156.67
#>
#> Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> disp   0.5925     0.0320   18.52   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 47.24 on 31 degrees of freedom
#> Multiple R-squared:  0.9171, Adjusted R-squared:  0.9144
#> F-statistic: 342.8 on 1 and 31 DF,  p-value: < 2.2e-16
summary(mod3)
#>
#> Call:
#> lm(formula = hp ~ 0 + disp, data = mtcars)
#>
#> Residuals:
#>    Min     1Q Median     3Q    Max
#> -74.65 -28.76  16.60  26.64 156.67
#>
#> Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> disp   0.5925     0.0320   18.52   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 47.24 on 31 degrees of freedom
#> Multiple R-squared:  0.9171, Adjusted R-squared:  0.9144
#> F-statistic: 342.8 on 1 and 31 DF,  p-value: < 2.2e-16

plot(hp ~ disp, data=mtcars,
xlim=c(0,500),
ylim=c(0,300))
abline(mod1)
abline(mod2, lty=2) Created on 2021-04-06 by the reprex package (v2.0.0)

You may notice that r-squared is not the same in the original model as in the two subsequent versions. The solution simply must pass through (0,0) instead of (\overline{x},\overline{y}). Among other things, the reason r-squared is different is that the total sum of squares in y is not \sum (y_{i} - \overline{y})^2 but is \sum y^2 in the last two versions. This is not the same model.

The classic invariant solution to this problem is to center all the variables; notice these are identical in every way except for the y-intercept being zero to near computer precision [5.088e-16].

For cars:

> summary(lm(mpg~wt+disp, data=mtcars))

Call:
lm(formula = mpg ~ wt + disp, data = mtcars)

Residuals:
Min      1Q  Median      3Q     Max
-3.4087 -2.3243 -0.7683  1.7721  6.3484

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.96055    2.16454  16.151 4.91e-16 ***
wt          -3.35082    1.16413  -2.878  0.00743 **
disp        -0.01773    0.00919  -1.929  0.06362 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.917 on 29 degrees of freedom
Multiple R-squared:  0.7809,	Adjusted R-squared:  0.7658
F-statistic: 51.69 on 2 and 29 DF,  p-value: 2.744e-10

> summary(lm(scale(mpg, scale=FALSE)~scale(wt, scale=FALSE)+scale(disp, scale=FALSE), data=mtcars))

Call:
lm(formula = scale(mpg, scale = FALSE) ~ scale(wt, scale = FALSE) +
scale(disp, scale = FALSE), data = mtcars)

Residuals:
Min      1Q  Median      3Q     Max
-3.4087 -2.3243 -0.7683  1.7721  6.3484

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                 5.088e-16  5.156e-01   0.000  1.00000
scale(wt, scale = FALSE)   -3.351e+00  1.164e+00  -2.878  0.00743 **
scale(disp, scale = FALSE) -1.772e-02  9.190e-03  -1.929  0.06362 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.917 on 29 degrees of freedom
Multiple R-squared:  0.7809,	Adjusted R-squared:  0.7658
F-statistic: 51.69 on 2 and 29 DF,  p-value: 2.744e-10

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.