# How to model a regression interaction using tidymodels

Hi,

I am looking at a data set across groups to try and predict \hat{y} using tidymodels. It would seem to be a good candidate for the MARs algorithm as it automatically finds interactions. My over arching plan was to run MARs then use a pdp (myabe using the pdp package) to plot the obvious interaction which i artificially created in grp_1 and grp_2. Unfortunately i clearly have chosen the wrong model or I just cant see how to set it up without splitting the data myself manually and then running a regression separately. Can anyone help? Is there anyway to disentangle this without manually splitting the groups?

Thanks very much for your time

library(tidymodels)
library(tidyverse)
options(scipen = 999)

grp_1 <- seq(2, -2,length.out = 50) %>%
enframe() %>%
mutate(grp = 1) %>%
mutate(y_hat = seq(16, 4,length.out = 50))

grp_2 <- seq(2, -2,length.out = 50) %>%
enframe() %>%
mutate(grp = 2) %>%
mutate(y_hat = seq(4, 16,length.out = 50))

mydf <- bind_rows(grp_1, grp_2) %>%
select(x = value, grp, y_hat) %>%
mutate(grp = as.character(grp))

norm_recipe <- recipe(y_hat ~ ., data = mydf) %>%
step_dummy(grp) %>%
prep(training = mydf, retain = TRUE)

juice(norm_recipe) %>% glimpse()

fit_mars <- mars() %>%
set_engine("earth") %>%
set_mode("regression") %>%
fit(y_hat ~ ., data = juice(norm_recipe))

test_results <- predict(fit_mars, new_data = juice(norm_recipe)) %>%
rename(fit_mars = .pred) %>%
bind_cols(y_hat = mydf$y_hat) ggplot(mydf, aes(x=x, y=y_hat, colour = grp)) + geom_hline(yintercept=test_results$fit_mars) +
geom_point() +
theme_minimal() +
ggtitle('Relationship between x and y_hat with a made up interaction')



The default of the . means to include all the variables, but wouldnt cover interaction, you should try .*. instead

fit(y_hat ~ .*., data = juice(norm_recipe))
1 Like

See step_interact() as well as the package vignette for dummy variables and interactions in recipes

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Hi Both,

Thanks very much for helping me out with this. I really like using the packages.
@nirgrahamuk, thank you very much for your reply, I was not aware of this at all.
@Max I went through the docs and was able to get it to work with your method as well.

I have one last question in relation to getting the pdp up and running, if you have the time

If i produce the code below to see the model with respect to the interaction I get the following error

Error: Did you mean to use new_data instead of newdata?

# The interaction is the number one predictor which is great
library(vip)
vip(fit_lm)

# Now we have a model which captures everything 100% correctly
# Produce the partial dependency plot for the interaction.
library(pdp)
partial(fit_lm, pred.data = mydf, pred.var = "grp_X2_x_x",
train = juice(norm_recipe), type = "regression")