Error during tuning when using add_recipe() instead of add_formula() to workflow in tidymodels

I'm trying to use Julia Silge's tutorial on xgboost to build an analysis on an unbalanced dataset (Tune XGBoost with tidymodels and #TidyTuesday beach volleyball | Julia Silge). I made a single change to her code to add upsampling (e.g., step_upsample(...)) in a recipe step. However when I use a recipe() call rather than the add_formula() in her code, the tuning step fails. e.g., inserting the recipe call here:

xgb_wf <- workflow() %>%
  add_recipe(recipe(win ~ ., data = vb_train)) %>%
  # add_formula(win ~ .) %>%
  add_model(xgb_spec)

══ Workflow ═════════
Preprocessor: Recipe
Model: boost_tree()

── Preprocessor ────────
0 Recipe Steps

── Model ───────────
Boosted Tree Model Specification (classification)

Main Arguments:
mtry = tune()
trees = 1000
min_n = tune()
tree_depth = tune()
learn_rate = tune()
loss_reduction = tune()
sample_size = tune()

Computational engine: xgboost

But then at the tune_grid step I get an error:

xgb_res <- tune_grid(
  xgb_wf,
  resamples = vb_folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE)
)

Fold10: preprocessor 1/1, model 30/30: Error in xgboost::xgb.DMatrix(x, label = y, missing = NA): 'data' has class 'character' and length 193500.

Does anyone have any hints on what I can do to fix it?

There are similar error messages reported here: tidymodels: error when predicting on new data with xgboost model

and here: xgboost works with add_formula but not with recipe

and here: Tidymodels: Error in xgboost::xgb.DMatrix(data = newdata, missing = NA): 'data' has class 'character' and length 29241. #> 'data' accepts either a numeric matrix or a single filename.

Thanks,

Rich

This appears to be the same issue as this post

The data that you are using contains factor columns and xgboost does not allow for non-numeric predictors (unlike almost every other tree-based model). There is some documentation here.

Use a recipe with step_dummy() to solve this. An easy way to do this is via the usemodels package (if you are unfamiliar with recipes).

1 Like

Thanks Max, I now understand that add_formula() automagically adds the dummy variables, but these have to be manually added when using a recipe. e.g.,

vb_recipe <- recipe(win ~ ., data = vb_train) %>%
  step_dummy(circuit, gender)

xgb_wf <- workflow() %>%
  add_recipe(vb_recipe) %>%
  # add_formula(win ~ .) %>%
  add_model(xgb_spec)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.