Specifying formulas to controlling the form of interactions using `lm`

Here's a simple example that illustrates my question:

df <- data.frame(y = rnorm(10), x = rnorm(10), z = sample(c("a","b"), size = 10, replace = TRUE))

Using the * operator gives me a regression of y on 1, 1[z = b], x, 1[z=b]x.

> lm(data = df, y ~ as.factor(z)*x)

Call:
lm(formula = y ~ as.factor(z) * x, data = df)

Coefficients:
    (Intercept)    as.factor(z)b                x  as.factor(z)b:x  
        -0.2351           0.1524           0.2309          -0.2699

I would like to regress y on 1[z = a], 1[z=a]x, 1[z=b], 1[z=b]x (with no constant term). This regression will produce the same fitted values as the one above, but the interpretation of the coefficients is different, and preferable in some cases. How can I specify the formula to do this in a single regression?

In formulas, -1 or +0 specifies no intercept.

Maybe this does what you want:

set.seed(1)
df <-
  data.frame(
    y = rnorm(10),
    x = rnorm(10),
    z = factor(sample(c("a", "b"), size = 10, replace = TRUE))
  )

model_1 <- lm(y ~ z * x, data = df)
summary(model_1)
#> 
#> Call:
#> lm(formula = y ~ z * x, data = df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2091 -0.2834  0.0853  0.3167  0.7597 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  -0.3996     0.4209  -0.950    0.379
#> zb            0.7696     0.4946   1.556    0.171
#> x             0.7982     0.6132   1.302    0.241
#> zb:x         -1.2128     0.6526  -1.858    0.112
#> 
#> Residual standard error: 0.6726 on 6 degrees of freedom
#> Multiple R-squared:  0.505,  Adjusted R-squared:  0.2575 
#> F-statistic: 2.041 on 3 and 6 DF,  p-value: 0.2098

model_2 <- lm(y ~ z * x - 1, data = df)
summary(model_2)
#> 
#> Call:
#> lm(formula = y ~ z * x - 1, data = df)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.2091 -0.2834  0.0853  0.3167  0.7597 
#> 
#> Coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> za    -0.3996     0.4209  -0.950    0.379
#> zb     0.3700     0.2599   1.424    0.204
#> x      0.7982     0.6132   1.302    0.241
#> zb:x  -1.2128     0.6526  -1.858    0.112
#> 
#> Residual standard error: 0.6726 on 6 degrees of freedom
#> Multiple R-squared:  0.5203, Adjusted R-squared:  0.2005 
#> F-statistic: 1.627 on 4 and 6 DF,  p-value: 0.2827

Created on 2023-03-19 with reprex v2.0.2

If you want a fuller parameterization you might need to make the indicators yourself.

1 Like

Thanks. That's not quite what I want because I also want to "remove the intercept" on the x term. So that it's za, zb, za:x, and zb:x. I guess I may have to construct them manually, as you suggest.

Doesn't Max's second model do what you want?

No because it's still "main effect" for x plus the incremental difference for group b.
Here's how you would do what I want by hand (creating new variables, which is what I was hoping to avoid).

set.seed(1)
df <- data.frame(y = rnorm(10), x = rnorm(10), z = factor(sample(c("a","b"))),
                 size = 10, replace = TRUE)

df$xa <- df$x * (df$z == "a")
df$xb <- df$x * (df$z == "b")

# Max's solution
> lm(data = df, y ~ 0 + x*z)

Call:
lm(formula = y ~ 0 + x * z, data = df)

Coefficients:
       x        za        zb      x:zb  
-0.44849   0.24849  -0.09732   0.59642  

> lm(data = df, y ~ 0 + z + xa + xb)

Call:
lm(formula = y ~ 0 + z + xa + xb, data = df)

Coefficients:
      za        zb        xa        xb  
 0.24849  -0.09732  -0.44849   0.14793  

My desired parameterization is equivalent to running two separate regressions subset by the value of z. But often it is useful to have all of the coefficients in a single regression.

but your handcrafted example is a single regression... and you seem to know how to specify it, so what are you asking for help with ?

are you hoping to in some way automate this part --

df$xa <- df$x * (df$z == "a")
df$xb <- df$x * (df$z == "b")

?

Yes, this was just intended as a simple MWE. My question is whether there's functionality within formula specification syntax that can be used to automate this in more complicated examples.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.