Help using option_add to update parameter ranges in a workflowset

I'm running into an issue using option_add to update parameter ranges for a model spec in my workflowset. I want to update the range of the mtry parameter, which only applies to 1 of my model specifications, xgb_spec. Using this as a reference (r - Tune recipe in workflow set with custom range (or value) - Stack Overflow) I set the range then update it using option_add, specifying the relevant model spec in the id argument.

But when I run workflow_map, i get an error: All options should be named. In the option_add documentation, there is a ... argument for "... a list of named options." It kinda seems that relates to the error i'm getting, but i am not sure what to add here, or if i even need to add anything at all.

I'm not sure if i'm getting that error because i'm only trying to update 1 model spec instead of all 3 (maybe that means the options for the other 2 are not named? Or if it has to do with not naming the option I am updating? or maybe a combination of both?

Reprex below:

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.1.1
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> Warning: package 'dials' was built under R version 4.1.1
#> Warning: package 'ggplot2' was built under R version 4.1.1
#> Warning: package 'infer' was built under R version 4.1.1
#> Warning: package 'parsnip' was built under R version 4.1.1
#> Warning: package 'tune' was built under R version 4.1.1
#> Warning: package 'workflows' was built under R version 4.1.1
#> Warning: package 'workflowsets' was built under R version 4.1.1
#> Warning: package 'yardstick' was built under R version 4.1.1
library(discrim)
#> Warning: package 'discrim' was built under R version 4.1.1
#> 
#> Attaching package: 'discrim'
#> The following object is masked from 'package:dials':
#> 
#>     smoothness
library(workflowsets)
library(finetune)
#> Warning: package 'finetune' was built under R version 4.1.1

data(parabolic)

set.seed(1)
split <- initial_split(parabolic)
train_set <- training(split)
test_set <- testing(split)
set.seed(2)
train_resamples <- bootstraps(train_set, times = 5)

mars_disc_spec <- 
  discrim_flexible(prod_degree = tune()) %>% 
  set_engine("earth")

reg_disc_sepc <- 
  discrim_regularized(frac_common_cov = tune(),
                      frac_identity = tune()) %>% 
  set_engine("klaR")

cart_spec <- 
  decision_tree(cost_complexity = tune(),
                min_n = tune()) %>% 
  set_engine("rpart") %>% 
  set_mode("classification")

xgb_spec <-  
  boost_tree(
    trees = 700, 
    tree_depth = tune(),
    min_n = tune(), 
    loss_reduction = tune(),
    sample_size = tune(),
    mtry = tune(),        
    learn_rate = tune()
  ) %>% 
  set_engine("xgboost") %>% 
  set_mode("classification")

mtry_param <-
  parameters(mtry()) %>%
  update(mtry = mtry(c(0, 20))) 

all_workflows <- 
  workflow_set(
    preproc = list("formula" = class ~ .),
    models = list(regularized = reg_disc_sepc, xgb = xgb_spec, cart = cart_spec)
  ) %>%
  option_add(mtry_param, id = "formula_xgb")

class_metrics <- 
  metric_set(roc_auc, accuracy, sens, spec, mn_log_loss)

race_ctrl <-
  control_race(
    verbose = TRUE,
    allow_par = TRUE,
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = TRUE
  )

doParallel::registerDoParallel()

wf_res <- 
  all_workflows %>% 
  workflow_map(fn = "tune_race_anova",
               resamples = train_resamples,
               grid = 10,
               metrics = class_metrics, 
               control = race_ctrl
  )
#> Error: Problem with `mutate()` column `option`.
#> i `option = purrr::map(option, append_options, dots)`.
#> x All options should be named.
#> Execution stopped; returning current results
Created on 2021-10-22 by the reprex package (v2.0.1)

When I execute this bit of code:

I get the x All options should be named error. Since you are updating the param_info argument, you should change the line to

option_add(param_info = mtry_param, id = "formula_xgb")

However, there are two other issues. I would get the whole set of parameters to pass in using:

mtry_param <-
 xgb_spec %>% 
 parameters() %>%.  #< not just the mtry() parameter
 update(mtry = mtry(c(1,2))) 

The data set has two predictors so I changed mtry(c(0,20)) to mtry(c(1,2)) (it has to be > 0).

Thanks! The suggestions are very helpful.

For this I part though,

in my real code I have more than 1 model spec I where I want to update mtry, and was hoping to be able to batch-update the mtry() parameter for all of them at once by passing in all their workflow IDs (I have a similar use case where i want to batch update pca_comps for only the models using a recipe with step_pca()).

if i understand this correctly it seems i need to update one model at a time, but is there a way to, for example, update mtry for all models in my workflowset that have that as a parameter?

No. That might be difficult to do since not all models have the same parameters (even if they share mtry()). You will have to write a loop.

If you are not tuning anything in the recipe, you can use an integer for the grid argument and the function can figure our mtry for you).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.