Tidymodels using tune_race_anova with a workflow


I have been looking at the finetune package to try out the tune_race_anova based on the presentation given at rstudio::global 2021.

I am trying to get it to work with a work flow but am confusing myself.
Below is some pseudocode.
I'm getting stuck on the portion around tune_res_rf and where to actually integrate the grid
Does anyone have any ideas?

data_split <- initial_split(explore_data, strata = "tgt", prop = 0.75)

train_explore <- training(data_split)
test_explore  <- testing(data_split)
# Generate resamples and repeat
report_resamples <- vfold_cv(train_explore, v = 10, repeats = 1, strata = tgt)

# Set up the model definition
preprocess <- train_explore %>%
  recipe(tgt ~ .) %>%

# BUILD A RANDOM FOREST MODEL ---------------------------------------------
rf_mod <- rand_forest(
  mtry = tune(),
  trees = tune(),
  min_n = tune()) %>%
  set_mode("classification") %>%

rf_grid <- dials::parameters(
  finalize(mtry(), select(explore_data, -tgt)),

tune_wf <- workflow() %>%
  add_recipe(preprocess) %>%

# Tune the models
no_cores <- detectCores() - 1  

# This is the bit where i get stuck
tune_res_rf <- tune_race_anova(tune_wf,
                         resamples = report_resamples,
                         grid = rf_grid,
                         perf = metric_set(roc_auc, sens, spec, kap, accuracy)


Thank you for your time

I don't see any issues with that code (although I can't reproduce the results). Does an error occur?

Hi @Max

When i run the portion

tune_res_rf <- tune_race_anova(tune_wf,
                         resamples = report_resamples,
                         grid = rf_grid,
                         perf = metric_set(roc_auc, sens, spec, kap, accuracy)

I get the error

The provided grid has the following parameter columns that have not been marked for tuning by tune(): 'name', 'id', 'source', 'component', 'component_id', 'object'.

One thing i can see is i have not set the grid size anywhere but i am not sure where to set it either :slight_smile:

That's very odd. We'll need a small, reproducible example to test with. Can you substitute another data set to get the error?

Sure, Please see below. There is a factor warning but this isn't in my original model.
It takes under 2 minutes to run

data_split <- initial_split(credit_data, strata = "Status", prop = 0.75)

train_explore <- training(data_split)
test_explore  <- testing(data_split)

# Generate resamples and repeat
report_resamples <- vfold_cv(train_explore, v = 10, repeats = 1, strata = Status)

# Set up the model definition
preprocess <- train_explore %>%
  recipe(Status ~ .) %>%
  themis::step_downsample(Status) %>% 
# BUILD A RANDOM FOREST MODEL ---------------------------------------------
rf_mod <- rand_forest(
  mtry = tune(),
  trees = tune(),
  min_n = tune()) %>%
  set_mode("classification") %>%

rf_grid <- dials::parameters(
  finalize(mtry(), select(credit_data, -Status)),

tune_wf <- workflow() %>%
  add_recipe(preprocess) %>%

no_cores <- detectCores() - 1  

tune_res_rf <- tune_race_anova(tune_wf,
                               resamples = report_resamples,
                               grid = rf_grid,
                               perf = metric_set(roc_auc, sens, kap, accuracy)
There were three problems:

  • There isn't a perf argument; I think you meant metrics. I didn't see that either when I looked at your code.
  • In the recipe, step_dummy(all_nominal()) was capturing the outcome. This happens a lot and the devel version of recipes has all_nominal_predictors(). Until then, use step_dummy(all_nominal(), -Status). However, the ranger package does not require dummy variables for predictors, so you can skip that if you want.
  • The grid code returns the parameters. You could pass this to the param_info argument or make the grid with one of the grid functions, such as
rf_grid <- dials::parameters(
   finalize(mtry(), select(credit_data, -Status)),
   min_n()) %>% 

One other thing... this data set has some missing values so you might want to add one of the imputation steps to the recipe (otherwise ranger will error).

Here's my script:



credit_data <- credit_data %>% na.omit()

data_split <- initial_split(credit_data, strata = "Status", prop = 0.75)

train_explore <- training(data_split)
test_explore  <- testing(data_split)

# Generate resamples and repeat
report_resamples <- vfold_cv(train_explore, v = 10, repeats = 1, strata = Status)

# Set up the model definition
preprocess <- train_explore %>%
   recipe(Status ~ .) %>%

# BUILD A RANDOM FOREST MODEL ---------------------------------------------
rf_mod <- rand_forest(
   mtry = tune(),
   trees = tune(),
   min_n = tune()) %>%
   set_mode("classification") %>%

rf_grid <- dials::parameters(
   finalize(mtry(), select(credit_data, -Status)),
   min_n()) %>% 

tune_wf <- workflow() %>%
   add_recipe(preprocess) %>%

tune_res_rf <- tune_race_anova(tune_wf,
                               resamples = report_resamples,
                               grid = rf_grid,
                               metrics = metric_set(roc_auc, sens, kap, accuracy)

Thanks very much @Max
That did the trick

