Help using `tune_sim_anneal` with a specified grid

I'm running into an issue using a custom grid with tune_sim_anneal(). Im trying to follow the approach i saw in this workflowsets issue using option_add to specify the grid i want to use for a given workflow ID.

When i use this approach with tune_grid it works, but when i use tune_sim_anneal I get an error - see reprex below:

library(tidymodels)
#> Warning: package 'tidymodels' was built under R version 4.1.3
#> Warning: package 'broom' was built under R version 4.1.3
#> Warning: package 'dials' was built under R version 4.1.3
#> Warning: package 'scales' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'infer' was built under R version 4.1.3
#> Warning: package 'modeldata' was built under R version 4.1.3
#> Warning: package 'parsnip' was built under R version 4.1.3
#> Warning: package 'purrr' was built under R version 4.1.3
#> Warning: package 'recipes' was built under R version 4.1.3
#> Warning: package 'rsample' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'tidyr' was built under R version 4.1.3
#> Warning: package 'tune' was built under R version 4.1.3
#> Warning: package 'workflows' was built under R version 4.1.3
#> Warning: package 'workflowsets' was built under R version 4.1.3
#> Warning: package 'yardstick' was built under R version 4.1.3

data(parabolic)

set.seed(1)
split <- initial_split(parabolic)
train_set <- training(split)
test_set <- testing(split)

set.seed(2)
train_resamples <- bootstraps(train_set, times = 5)

logistic_reg_spec <- 
  logistic_reg(penalty = tune(),
               mixture = tune()) %>% 
  set_engine("glmnet")

workflow <- 
  workflow_set(
    preproc = list("formula" = class ~ .),
    models = list(lm = logistic_reg_spec)
  ) %>%
  option_add(id = "formula_lm", grid = grid_max_entropy(extract_parameter_set_dials(logistic_reg_spec), size = 10))

#confirm grid is there
workflow %>%
  unnest(option) %>%
  select(option) %>%
  unnest(option)
#> # A tibble: 10 x 2
#>     penalty mixture
#>       <dbl>   <dbl>
#>  1 4.84e- 5  0.0522
#>  2 1.17e-10  0.872 
#>  3 8.92e- 7  0.941 
#>  4 1.02e- 3  0.645 
#>  5 2.97e- 1  0.496 
#>  6 9.18e- 7  0.300 
#>  7 2.21e-10  0.0617
#>  8 5.89e-10  0.441 
#>  9 6.33e- 1  0.942 
#> 10 6.71e- 1  0.118

grid_search_results <-
  workflow %>%
  workflowsets::workflow_map(
    seed = 1503,
    fn = "tune_grid",
    resamples = train_resamples,
    metrics = metric_set(roc_auc)
  )
#> Warning: package 'glmnet' was built under R version 4.1.3

#grid search with custom grid looks ok
grid_search_results
#> # A workflow set/tibble: 1 x 4
#>   wflow_id   info             option    result   
#>   <chr>      <list>           <list>    <list>   
#> 1 formula_lm <tibble [1 x 4]> <opts[3]> <tune[+]>

sim_anneal_results <-
  workflow %>%
  workflowsets::workflow_map(
    seed = 9999,
    fn = "tune_sim_anneal",
    resamples = train_resamples,
    metrics = metric_set(roc_auc)
  )
#> Warning: The `...` are not used in this function but one or more objects were
#> passed: 'grid'

# there's an error using tune_sim_anneal with a custom grid
sim_anneal_results %>%
  unnest(result) %>% 
  select(result) %>% 
  unnest(result)
#> # A tibble: 1 x 1
#>   result                                                                        
#>   <try-errr>                                                                    
#> 1 Error in tune_sim_anneal_workflow(object, resamples = resamples, iter = iter,~

Thanks for the community post and the helpful reprex!

You're seeing the warning:

#> Warning: The ... are not used in this function but one or more objects were
#> passed: 'grid'

because the grid option in workflow is being passed to tune_sim_anneal(), which doesn't take a grid argument.

It does take an initial argument, which is the result of a previous tuning call (and one that you've already carried out via grid_search_results). If you'd like to use the results from that maximum entropy search as the initial results for simulated annealing, you can add them as an option to your workflow set like so:

library(tidymodels)

data(parabolic)

set.seed(1)
split <- initial_split(parabolic)
train_set <- training(split)
test_set <- testing(split)

set.seed(2)
train_resamples <- bootstraps(train_set, times = 5)

logistic_reg_spec <- 
  logistic_reg(penalty = tune(),
               mixture = tune()) %>% 
  set_engine("glmnet")

workflow <- 
  workflow_set(
    preproc = list("formula" = class ~ .),
    models = list(lm = logistic_reg_spec)
  ) %>%
  option_add(id = "formula_lm", grid = grid_max_entropy(extract_parameter_set_dials(logistic_reg_spec), size = 10))

grid_search_results <-
  workflow %>%
  workflowsets::workflow_map(
    seed = 1503,
    fn = "tune_grid",
    resamples = train_resamples,
    metrics = metric_set(roc_auc)
  )

# grid search with custom grid looks ok
grid_search_results
#> # A workflow set/tibble: 1 × 4
#>   wflow_id   info             option    result   
#>   <chr>      <list>           <list>    <list>   
#> 1 formula_lm <tibble [1 × 4]> <opts[3]> <tune[+]>

workflow_set_new <-
  workflow %>%
  option_remove(grid) %>%
  option_add(initial = grid_search_results$result[[1]])

sim_anneal_results <-
  workflow_set_new %>%
  workflowsets::workflow_map(
    seed = 9999,
    fn = "tune_sim_anneal",
    resamples = train_resamples,
    metrics = metric_set(roc_auc)
  )
#> Optimizing roc_auc
#> Initial best: 0.77995
#> 1 ◯ accept suboptimal roc_auc=0.77986 (+/-0.01014)
#> 2 ◯ accept suboptimal roc_auc=0.77986 (+/-0.01014)
#> 3 ◯ accept suboptimal roc_auc=0.77982 (+/-0.01013)
#> 4 ♥ new best roc_auc=0.78037 (+/-0.01008)
#> 5 ◯ accept suboptimal roc_auc=0.77986 (+/-0.01014)
#> 6 ◯ accept suboptimal roc_auc=0.77986 (+/-0.01014)
#> 7 + better suboptimal roc_auc=0.78027 (+/-0.01008)
#> 8 ◯ accept suboptimal roc_auc=0.77994 (+/-0.01002)
#> 9 ─ discard suboptimal roc_auc=0.5
#> 10 ◯ accept suboptimal roc_auc=0.76109 (+/-0.01263)

sim_anneal_results$result
#> [[1]]
#> # Tuning results
#> # Bootstrap sampling 
#> # A tibble: 55 × 5
#>    splits            id         .metrics          .notes           .iter
#>    <list>            <chr>      <list>            <list>           <int>
#>  1 <split [375/134]> Bootstrap1 <tibble [10 × 6]> <tibble [0 × 3]>     0
#>  2 <split [375/132]> Bootstrap2 <tibble [10 × 6]> <tibble [0 × 3]>     0
#>  3 <split [375/142]> Bootstrap3 <tibble [10 × 6]> <tibble [0 × 3]>     0
#>  4 <split [375/146]> Bootstrap4 <tibble [10 × 6]> <tibble [0 × 3]>     0
#>  5 <split [375/135]> Bootstrap5 <tibble [10 × 6]> <tibble [0 × 3]>     0
#>  6 <split [375/134]> Bootstrap1 <tibble [1 × 6]>  <tibble [0 × 3]>     1
#>  7 <split [375/132]> Bootstrap2 <tibble [1 × 6]>  <tibble [0 × 3]>     1
#>  8 <split [375/142]> Bootstrap3 <tibble [1 × 6]>  <tibble [0 × 3]>     1
#>  9 <split [375/146]> Bootstrap4 <tibble [1 × 6]>  <tibble [0 × 3]>     1
#> 10 <split [375/135]> Bootstrap5 <tibble [1 × 6]>  <tibble [0 × 3]>     1
#> # … with 45 more rows

Created on 2023-03-08 with reprex v2.0.2

2 Likes

Thank you, that really helps clarify things for me!

I was mistakenly assuming every tune_* function required a grid, and I could mix and match an arbitrary tune_* function with a grid_* output. Good to learn that that isn't always the case (and was in the docs all along :grimacing:) - thanks again!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.