Hello everyone,
I'm currently trying out tidymodels for a project I'm on. So far, it's been great, but because my data is feature intensive, I wanted to add some filters to my recipes to do a base feature selection before sending it to XGBoost for further cleaning. My question is whether I should prep() a recipe (or better, use prepper() on my cross validation splits) and pass this prepped recipe as the recipe for my workflow, or if I should just use the default recipe without prep. Essentially, what I'm wondering is which option A or B is more efficient (i.e. repeats less work):
Option A:
prepped_recipe <-
recipe(y ~ ., data = train_data) %>%
step_corr(all_predictors()) %>%
prep()
some_workflow <-
workflow() %>%
add_model(some_mod) %>%
add_recipe(prepped_recipe)
some_workflow %>%
tune_grid(
resample = folds,
grid = 20
)
Option B:
unprepped_recipe <-
recipe(y ~ ., data = train_data) %>%
step_corr(all_predictors())
some_workflow <-
workflow() %>%
add_model(some_mod) %>%
add_recipe(unprepped_recipe)
some_workflow %>%
tune_grid(
resample = folds,
grid = 20
)
Assume folds is is just vfold_cv(train, v = 10) where train is the training(split) from an initial_split on the dataset, and grid is some arbitrary parameter grid for the model.