What is the correct way to passed prepped recipes to a tidymodels workflow?

Hello everyone,

I'm currently trying out tidymodels for a project I'm on. So far, it's been great, but because my data is feature intensive, I wanted to add some filters to my recipes to do a base feature selection before sending it to XGBoost for further cleaning. My question is whether I should prep() a recipe (or better, use prepper() on my cross validation splits) and pass this prepped recipe as the recipe for my workflow, or if I should just use the default recipe without prep. Essentially, what I'm wondering is which option A or B is more efficient (i.e. repeats less work):

Option A:

prepped_recipe <- 
  recipe(y ~ ., data = train_data) %>%
  step_corr(all_predictors()) %>%
  prep()

some_workflow <-
  workflow() %>%
  add_model(some_mod) %>%
  add_recipe(prepped_recipe)

some_workflow %>%
  tune_grid(
    resample = folds,
    grid = 20
  )

Option B:

unprepped_recipe <- 
  recipe(y ~ ., data = train_data) %>%
  step_corr(all_predictors())

some_workflow <-
  workflow() %>%
  add_model(some_mod) %>%
  add_recipe(unprepped_recipe)

some_workflow %>%
  tune_grid(
    resample = folds,
    grid = 20
  )

Assume folds is is just vfold_cv(train, v = 10) where train is the training(split) from an initial_split on the dataset, and grid is some arbitrary parameter grid for the model.

I think its Option A as the recipe is preped!

Thanks, I that's what I was thinking as well. I guess my follow up question is why don't more people use the prepped recipe in their demos (every demo I see has done option B). Am I missing something here?

Take a look at this Book by Julia Silge: https://www.tmwr.org. Am sure most your questions will be answered. She always preps her recipes and she is part of the team working on the tidymodels meta-package.

Overall, we suggest that you do not give a workflow a prepped recipe.

It is going to be re-prepped within resampling anyway.

Also, there might be something that breaks the connection between what the workflow would have prepped versus the pre-prepped version. Let the workflow take care of it.

Thank you for the reference, this is a good book. I wanted to point out that though she does prep her recipes, I can't find an example in the book where she passes the prepped recipe to the workflow. As Max says below, it seems it would be redundant to do so anyway.

But thank you for the resource, I'll definitely refer to it over the next couple of days.

Thanks Max! This makes sense. So I have three models I want to compare performance for, and have three separate workflows. It would be nice if there was a way that it only has to prep the data once and then can run the different workflows instead of prepping it three different times in each workflow. Is there a nice way of handling this?

There are some functions in workflows to do this but are more developer oriented (and are used inside of tune) such as .fit_pre() and .fit_model(). We don't support them outside of that context.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.