recipes package cannot create interaction term in step_interact

I'm using a medical insurance data set to practice modeling skills that looks like this:

> insur_dt
      age    sex    bmi children smoker    region   charges
   1:  19 female 27.900        0    yes southwest 16884.924
   2:  18   male 33.770        1     no southeast  1725.552
   3:  28   male 33.000        3     no southeast  4449.462
   4:  33   male 22.705        0     no northwest 21984.471
   5:  32   male 28.880        0     no northwest  3866.855
  ---                                                      
1334:  50   male 30.970        3     no northwest 10600.548
1335:  18 female 31.920        0     no northeast  2205.981
1336:  18 female 36.850        0     no southeast  1629.833
1337:  21 female 25.800        0     no southwest  2007.945
1338:  61 female 29.070        0    yes northwest 29141.360

I'm using recipes as part of the tidymodels meta-package to prepare my data for use in a model, and I have determined that bmi, age, and smoker form an interaction term.

insur_split <- initial_split(insur_dt)

insur_train <- training(insur_split)
insur_test <- testing(insur_split)

# we are going to do data processing and feature engineering with recipes

# below, we are going to predict charges using everything else(".")
insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_zv(all_numeric()) %>%
    step_normalize(all_numeric()) %>%
    step_interact(~ bmi:smoker:age) %>% 
    prep()

Per the tidymodels guide/documentation, I have to specify the interaction as a step in the recipe as step_interact. However, I am getting an error when I attempt to do so:

> insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
+     step_dummy(all_nominal()) %>%
+     step_zv(all_numeric()) %>%
+     step_normalize(all_numeric()) %>%
+     step_interact(~ bmi:smoker:age) %>% 
+     prep()
Interaction specification failed for: ~bmi:smoker:age. No interactions will be created.partial match of 'object' to 'objects'

I am new to modeling and am not quite sure why I am getting this error. I am simply trying to state that charges is explained by all other predictors, and that smoker (a yes/no factor), age (numeric), and bmi (double) all interact with each other to inform the outcome. What am I doing wrong?

From the documentation:

step_interact can create interactions between variables. It is primarily intended for numeric data; categorical variables should probably be converted to dummy variables using step_dummy() prior to being used for interactions.

step_dummy(all_nominal()) turned the variable smoker into smoker_yes. Below, you'll see that I just changed the name of smoker in the interaction term to smoker_yes.

insur_rec <- recipe(charges ~ bmi + age + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_interact(terms = ~ bmi:age:smoker_yes) %>% 
    prep(verbose = TRUE, log_changes = TRUE)

Based on you experience, we'll improve the documentation to make this more clear.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.