error in predict() when using stacks in tidymodel workflow sqrt non-numeric argument

Hi, I am a newcomer to tidymodels (and R in general) and have worked through the (fantastic) online literature that is available from yourselves and other bloggers to very nearly run a complete workflow.

I am able to successfully make predictions from several models independently but wanted to use the 'stacks' process to improve the overall prediction.

here is snippet of code so far;

# build recipe
trainDataHomeXg.recipe <- training(lgDataHomeXg.split) %>%
  recipe(labelHomeXg ~ .) %>%
  step_normalize(all_predictors()) %>%
  step_sqrt(all_outcomes()) %>%
  prep()

# bake
trainDataHomeXg.bake <-
  bake(trainDataHomeXg.recipe, training(lgDataHomeXg.split))
trainDataHomeXg.folds <-
  trainDataHomeXg.bake %>% vfold_cv(v = 5, repeats = 2)

# build XGBoost model
xGTrainDataXg.model <-
  parsnip::boost_tree(
    mode = "regression",
    trees = tune(),
    min_n = tune(),
    tree_depth = tune(),
    learn_rate = tune(),
    loss_reduction = NULL,
    stop_iter = NULL
  ) %>%
  set_engine("xgboost")

### create workflow
xGTrainDataHomeXg.wflow <-
  workflow() %>% add_recipe(trainDataHomeXg.recipe) %>% add_model(xGTrainDataXg.model)

# tune model
xGTrainDataHomeXg.tuneGridTight <- xGTrainDataHomeXg.wflow %>% tune_grid(
  resamples = trainDataHomeXg.folds,
  metrics = metric_set(rmse),
  grid = 200,
  control = control_stack_grid(),
  param_info = parameters(trees(range = c(550, 1650)),
                          min_n(range = c(50, 100)),
                          tree_depth(range = c(4, 12)),
                          learn_rate(range = c(-2.5, -0.3), trans = log10_trans())
  )
)

## I also build 3 other models (mlp, random forest and kNN) but the error occurs even if I ## add just the one model


trainDataHomeXg.stack <- stacks() %>%
  add_candidates(xGTrainDataHomeXg.tuneGridTight)

trainDataHomeXg.stackBlendPred <- 
trainDataHomeXg.stack %>% blend_predictions()

trainDataHomeXg.stackFitMembers <- trainDataHomeXg.stackBlendPred %>% fit_members()

trainDataHomeXg.stackPred <- testing(lgDataHomeXg.split) %>% bind_cols(predict(trainDataHomeXg.stackFitMembers, .))

Its at this stage I receive an error;
Error in sqrt(getElement(new_data, col_names[i])) : non-numeric argument to mathematical function

I have tried the usual online resources but struggling to make any headway. Not sure if the sort is part of the 'rmse' calculations or because I have put 'step_sqrt' in the recipe. I tried baking the data before adding to the predict() but this doesn't help.

This is the first time of asking online so if you need me to provide any more info or the information in a different format please let me know.

Thanks
Chris

You do not have to use prep() when setting up the workflow. Also, you don't have to bake() before making the resamples. All of this gets done automatically during resampling.

I suspect that may be the issue.

Also, we advocate doing transformations of outcomes outside of the recipe. tidymodels walls off the outcomes when making predictions so the recipe will fail since your outcome will not be in the data.

Here is what I suggest (you'll need to fill in the blanks)

# whatever the original data is: 

original_data <- 
  original_data %>% 
  mutate(labelHomeXg = sqrt(labelHomeXg))

lgDataHomeXg.split <- initial_split(original_data)

lgDataHomeXg.train <- training(lgDataHomeXg.split)

trainDataHomeXg.recipe <-  
  lgDataHomeXg.train %>%
  recipe(labelHomeXg ~ .) %>%
  step_normalize(all_predictors()) 

trainDataHomeXg.folds <- vfold_cv(lgDataHomeXg.train, v = 5, repeats = 2)

Thankyou so much, that did indeed fix the issue. Really appreciate your speedy and clear answer.

Just a point of note - some newbie user feedback. I have read the tidymodels literature but didn't take away from it that I didn't need to prep() and bake() because the workflow() would do it for me. Maybe I just missed it, but if it is important for users to know, maybe it could be promoted a bit more. Thanks again.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.