Saving workflow xgboost for later use

Hi everyone,

I am relatively new to R and therefore sorry in advance if this question is weirdly phrased or unclear.

I build an xgboost model based on the tidymodels framework as demonstrated by Julia in her youtube video about the topic (Tuning XGBoost using tidymodels - YouTube).
The problem I now run into is that I would like to save the model for later use (so I can load it into without building the model again), although saving it with the saveRDS function might cause compatibility issues in case of package version updates in the future. A more robust option then seems to be "xgb.save" or "xgb.save.raw", although to utilize this function the model should be of class xgb.booster, while my model is of class workflow. Is there a way to convert a workflow to xgb.booster model within the framework of tidymodels that I'm using or should I adress this problem from another angle and try to find a better way to save workflows instead?

I would appreciate a response of any kind and hopefulle clearly explained the problem.

Below Is some of my code to clarify:

xgb_bundes <- boost_tree(
  trees = tune(),
  tree_depth = tune(), 
  sample_size = tune(), 
  mtry = tune(),
  learn_rate = 0.06
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")
xgb_grid <- grid_latin_hypercube(
  trees(),
  tree_depth(),
  sample_size = sample_prop(),
  finalize(mtry(), data_train),
  size = 30
)

xgb_wf_bundes <- workflow() %>%
  add_formula(over_under ~ .) %>% 
  add_model(xgb_bundes)

set.seed(210)
data_fold_cv<- vfold_cv(data_train, v = 10, strata = over_under)

library(doParallel)
cores<-detectCores()

cl <- makeCluster(cores[1]-1)
#Register cluster
registerDoParallel(cl)

set.seed(285)
xgb_bundes_result <- tune_grid(
  xgb_wf_bundes,          
  resamples = data_fold_cv, 
  grid = xgb_grid,          
  control = control_grid(save_pred = T)
)

metrics_results <- xgb_bundes_result %>% collect_metrics()
best <- metrics_results %>% filter(.metric == "accuracy")
final_xgb_bundes <- finalize_workflow(xgb_wf_bundes, best)

model_bundes <- final_xgb_bundes %>% fit(data_train)
class(model_bundes)
[1] "workflow"

xgb.save(model_bundes, "model bundes")

---Which gives the following error:
Error in xgb.save(model_bundes, "model bundes") : model must be xgb.Booster.

Hi there!

Should be easy to fix you if you can create a very simple reprex ( FAQ: How to do a minimal reproducible example ( reprex ) for beginners ) example. That way we can see exactly how the model objects are captured and how to get it into the right format for you.

Thanks for your reply:). I attached my code to the post.

Hi :slight_smile: This is not a reprex (which means I cannot run it on my local system as is). I had a look at what she did here: Tune XGBoost with tidymodels and #TidyTuesday beach volleyball | Julia Silge. I am not going to run and setup all of that. As you will see at the very end she ran this:

final_xgb <- finalize_workflow(
  xgb_wf,
  best_auc
)

final_xgb

Rather use that then another workflow object see how they go about doing it here (different to yours but a lot clearer given it is also xgboost):
https://cran.r-project.org/web/packages/tidypredict/vignettes/xgboost.html

Look at what they do with the output to parse_model it. If you can get to the object from parse_model(model) as they have in the example you can read here that it gives us the ability to save it with a .yml. This should give you everything you need to avoid saving it with .RDS (see here: Save and re-load models • tidypredict )

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Following up after a good while, for folks who may come across this in the future:

  1. The xgboost fit object lives inside of your model workflow, and the workflow thus requires the same considerations for robustness as the xgboost fit itself (and ought to be saved with xgboost's native serialization functions).

  2. No conversion to xgb.booster necessary—you will indeed want to approach this by figuring out how to save the whole workflow. The workflow object itself contains needed information to safely/easily interface with the xgboost object, and can't be saved with xgboost's serialization functions.

The folks from the tidymodels team put together a new package, bundle, to provide a consistent interface for native serialization of model objects. bundle knows to call the correct saving/reloading functions for xgboost, parsnip, and workflows in succession. The bundle() verb prepares a model object for serialization, and then you can safely saveRDS() + readRDS() and pass between R sessions as you wish, and then unbundle() in the new session. So, for your workflow fitted with xgboost model_bundes:

mod_bundle <- bundle(model_bundes)
saveRDS(mod_bundle, file = "path/to/file.rds")

# in a new R session:
mod_bundle <- readRDS("path/to/file.rds")
mod_new <- unbundle(mod_bundle)

...should do the trick. :slight_smile: