How to model a regression interaction using tidymodels

Hi,

I am looking at a data set across groups to try and predict \hat{y} using tidymodels. It would seem to be a good candidate for the MARs algorithm as it automatically finds interactions. My over arching plan was to run MARs then use a pdp (myabe using the pdp package) to plot the obvious interaction which i artificially created in grp_1 and grp_2. Unfortunately i clearly have chosen the wrong model or I just cant see how to set it up without splitting the data myself manually and then running a regression separately. Can anyone help? Is there anyway to disentangle this without manually splitting the groups?

Thanks very much for your time

library(tidymodels)
library(tidyverse)
options(scipen = 999)

grp_1 <- seq(2, -2,length.out = 50) %>% 
  enframe() %>% 
  mutate(grp = 1) %>% 
  mutate(y_hat = seq(16, 4,length.out = 50))

grp_2 <- seq(2, -2,length.out = 50) %>% 
  enframe() %>% 
  mutate(grp = 2) %>% 
  mutate(y_hat = seq(4, 16,length.out = 50))

mydf <- bind_rows(grp_1, grp_2) %>% 
  select(x = value, grp, y_hat) %>% 
  mutate(grp = as.character(grp))

norm_recipe <- recipe(y_hat ~ ., data = mydf) %>%
  step_dummy(grp) %>% 
  prep(training = mydf, retain = TRUE)

juice(norm_recipe) %>% glimpse()

fit_mars <- mars() %>% 
  set_engine("earth") %>% 
  set_mode("regression") %>% 
  fit(y_hat ~ ., data = juice(norm_recipe))

test_results <- predict(fit_mars, new_data = juice(norm_recipe)) %>%
  rename(fit_mars = .pred) %>% 
  bind_cols(y_hat = mydf$y_hat)

ggplot(mydf, aes(x=x, y=y_hat, colour = grp)) +
  geom_hline(yintercept=test_results$fit_mars) +
  geom_point() +
  theme_minimal() +
  ggtitle('Relationship between x and y_hat with a made up interaction')

The default of the . means to include all the variables, but wouldnt cover interaction, you should try .*. instead

fit(y_hat ~ .*., data = juice(norm_recipe))
1 Like

See step_interact() as well as the package vignette for dummy variables and interactions in recipes

1 Like

Hi Both,

Thanks very much for helping me out with this. I really like using the packages.
@nirgrahamuk, thank you very much for your reply, I was not aware of this at all.
@Max I went through the docs and was able to get it to work with your method as well.

I have one last question in relation to getting the pdp up and running, if you have the time

If i produce the code below to see the model with respect to the interaction I get the following error

Error: Did you mean to use new_data instead of newdata?

# The interaction is the number one predictor which is great
library(vip)
vip(fit_lm)

# Now we have a model which captures everything 100% correctly
# Produce the partial dependency plot for the interaction.
library(pdp)
partial(fit_lm, pred.data = mydf, pred.var = "grp_X2_x_x", 
        train = juice(norm_recipe), type = "regression")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.