How to run tune_bayes for xgboost hyperparameter tuning using the output of tune_grid

Hi,

I am trying to use the output of tune_grid() as shown in the first part of chapter 14 of tmwr. I am running an xgboost model, but for some reason i get Error in check_gp_failure() when trying to run tune_bayes() using either the output of tune_grid() for the initial parameter of the function, or using an integer as suggested by the help page.
Reprex attached for reference:

library(tidyverse)
library(tidymodels)
set.seed(41)

# Load data
data(cells)
cells <- cells %>% select(-case)

# Split
cells_split <- initial_split(cells)
cells_train <- training(cells_split)
cells_test <- testing(cells_split)
cells_folds <- vfold_cv(cells_train, v = 5)

# Preprocessor
cells_recipe <- recipe(class ~ ., data = cells_train) %>% 
  step_nzv(all_predictors()) %>% 
  step_corr(all_predictors())

# Create model
xgb_spec <- boost_tree(mode = "classification",
                       trees = tune(),
                       mtry = tune(),
                       tree_depth = tune(),
                       min_n = tune(),
                       sample_size = tune(),
                       loss_reduction = tune(),
                       learn_rate = tune()
) %>% 
  set_engine("xgboost", importance = "permutation")

# Merge into workflow
cells_wf <- workflow() %>% 
  add_model(xgb_spec) %>% 
  add_recipe(cells_recipe)

xgb_grid <- grid_latin_hypercube(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  sample_size = sample_prop(),
  finalize(mtry(), cells_train),
  learn_rate(),
  size = 20
)

# Build grid and tune
xgb_tune_results <- tune_grid(
  cells_wf,
  resamples = cells_folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE),
  metrics = metric_set(roc_auc)
)

bayes_param <- cells_wf %>% 
  extract_parameter_set_dials()

xgb_tune_bayes <- cells_wf %>% 
  tune_bayes(
    iter = 10,
    resamples = cells_folds, 
    param_info = bayes_param,
    metrics = metric_set(roc_auc), 
    initial = xgb_tune_results,
    control = control_bayes(save_pred = TRUE, verbose = TRUE)
  )
#> Optimizing roc_auc using the expected improvement
#> 
#> ── Iteration 1 ─────────────────────────────────────────────────────────────────
#> 
#> i Current best:      roc_auc=0.8958 (@iter 0)
#> i Gaussian process model
#> x Gaussian process model: Error in `.f()`:
#> ! The parameter object contains...
#> Error in `check_gp_failure()`:
#> ! Gaussian process model was not fit.
#> ✖ Optimization stopped prematurely; returning current results.

I can't seem to find the problem here, but clearly I'm missing something.
Any help is greatly appreciated

You'll need to update the bayes_param object to give it a range for mtry.

1 Like

Hi Max,

Thanks for the help, as you suggested using

bayes_param <- cells_wf %>% 
  extract_parameter_set_dials() %>% 
  update(mtry = finalize(mtry(), cells_train))

fixed the issue.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.