mtry tune range is integers when counts = FALSE with xgboost engine

The boosted trees via xgboost webpage (Boosted trees via xgboost — details_boost_tree_xgboost • parsnip) states the user can pass the counts = FALSE argument to set_engine() to supply mtry values within [0,1]. If mtry is set to a value in [0, 1], I can use tune_sim_anneal() to tune the other parameters. When mtry = tune(), the mtry range is set to integers with an unknown upper limit. With counts = FALSE I was expecting the range for mtry to be a double and on [0, 1]. Is there a way to set the mtry range as a proportion for tuning using tune_sim_anneal()?

library(tidymodels)

xgb_reg <-
  boost_tree(
    mtry = tune(),
    trees = tune(),
    min_n = tune(),
    tree_depth = tune(),
    learn_rate = tune(),
    loss_reduction = tune(),
    sample_size = tune()) %>%
  set_engine("xgboost", counts = FALSE) %>%
  set_mode("classification")

# inspecting the mtry object below shows that it has type of integer
# and a lower limit of 1L and unknown upper limit
extract_parameter_set_dials(xgb_reg) %>%
  filter(name == "mtry") %>%
  pull(object) %>%
  str()

Hi @mbanghart!

If you'd like to tune over mtry with simulated annealing, you can:

  • set counts = TRUE and then define a custom parameter set to param_info, or
  • leave the counts argument as its default and initially tune over a grid to initialize those upper limits before using simulated annealing

Here's some example code demonstrating tuning on mtry with simulated annealing.

library(tidymodels)
library(finetune)

data(penguins, package = "modeldata")

# as a proportion:
bt_tune_prop <-
  boost_tree(mtry = tune()) %>%
  set_engine(engine = "xgboost", counts = FALSE) %>%
  set_mode(mode = "classification")

grid_anneal_prop <-
  tune_sim_anneal(
    bt_tune_prop,
    species ~ flipper_length_mm + island,
    bootstraps(penguins),
    param_info = 
      extract_parameter_set_dials(bt_tune_prop) %>% 
      update(mtry = mtry_prop())
  )
#> 
#> ❯  Generating a set of 1 initial parameter results
#> ✓ Initialization complete
#> 
#> Optimizing roc_auc
#> Initial best: 0.95642
#>  1 ◯ accept suboptimal  roc_auc=0.95613  (+/-0.00232)
#>  2 ♥ new best           roc_auc=0.9568   (+/-0.002101)
#>  3 ◯ accept suboptimal  roc_auc=0.95566  (+/-0.002179)
#>  4 ♥ new best           roc_auc=0.96007  (+/-0.002062)
#>  5 ♥ new best           roc_auc=0.96007  (+/-0.002045)
#>  6 ♥ new best           roc_auc=0.96185  (+/-0.002087)
#>  7 ◯ accept suboptimal  roc_auc=0.96165  (+/-0.002138)
#>  8 ◯ accept suboptimal  roc_auc=0.96149  (+/-0.002164)
#>  9 ♥ new best           roc_auc=0.96191  (+/-0.001738)
#> 10 ◯ accept suboptimal  roc_auc=0.96098  (+/-0.001949)

autoplot(grid_anneal_prop)


# as a count:
bt_tune_count <-
  boost_tree(mtry = tune()) %>%
  set_engine(engine = "xgboost") %>%
  set_mode(mode = "classification")

grid <-
  tune_grid(
    bt_tune_count,
    species ~ flipper_length_mm + island,
    bootstraps(penguins)
  )
#> i Creating pre-processing data to finalize unknown parameter: mtry

grid_anneal <-
  tune_sim_anneal(
    bt_tune_count,
    species ~ flipper_length_mm + island,
    bootstraps(penguins),
    initial = grid
  )
#> Optimizing roc_auc
#> Initial best: 0.96087
#>  1 ◯ accept suboptimal  roc_auc=0.95825  (+/-0.001964)
#>  2 ◯ accept suboptimal  roc_auc=0.95396  (+/-0.001823)
#>  3 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  4 + better suboptimal  roc_auc=0.95342  (+/-0.001994)
#>  5 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  6 + better suboptimal  roc_auc=0.95342  (+/-0.002041)
#>  7 ◯ accept suboptimal  roc_auc=0.94992  (+/-0.001875)
#>  8 ✖ restart from best  roc_auc=0.95408  (+/-0.002)
#>  9 ◯ accept suboptimal  roc_auc=0.95773  (+/-0.001989)
#> 10 ◯ accept suboptimal  roc_auc=0.95432  (+/-0.002001)

autoplot(grid_anneal)

Created on 2022-07-14 by the reprex package (v2.0.1)

If this doesn't do the trick for you, could you modify this code to demonstrate the functionality you're hoping to see?

Thanks!

Simon,
Thanks. Using update with the mtry_prop() was just what I needed.
Mark

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.