Error using tune() in bagged trees (Tidymodels)

Hi,

I am tuning a bagged tree model for a binary outcome:

bagtree_spec<-bag_tree(cost_complexity = tune(),tree_depth = tune(),min_n = tune()) %>%
set_engine("rpart", times=25) %>%
set_mode("classification")

...

bagtree_tune<-bagtree_wf %>%
tune_grid(resamples=folds,
metrics=metric_set(sens,spec,roc_auc),
control=control_grid(save_pred = FALSE),
grid=20)

I get an error message while tuning:
x Fold1: preprocessor 1/1, model 9/20: Error: Input must be a vector, not NULL.
x Fold2: preprocessor 1/1, model 9/20: Error: Input must be a vector, not NULL.
x Fold3: preprocessor 1/1, model 9/20: Error: Input must be a vector, not NULL.
x Fold3: preprocessor 1/1, model 16/20: Error: Input must be a vector, not NULL.
x Fold4: preprocessor 1/1, model 9/20: Error: Input must be a vector, not NULL.
x Fold5: preprocessor 1/1, model 9/20: Error: Input must be a vector, not NULL.

Remarkably, the tuning process does not error out. I searched around and found that cost_complexity needs to be < 1. So, I ran the code again with cost_complexity = 0 and it ran with no message. If both scenarios work, why am I seeing this error message in one case and not the other?

UPDATE: I found that the bagged tree only tuned using 19 values even though I requested 20. I think the tune() option chooses some values outside of the appropriate range for the hyperparameter.

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 R2 x64 (build 9600)

Matrix products: default

Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] xgboost_1.3.2.1 caret_6.0-85 lattice_0.20-38 kernlab_0.9-29
[5] rpart_4.1-15 vctrs_0.3.6 rlang_0.4.10 themis_0.1.3
[9] baguette_0.1.0 glmnet_4.1-1 Matrix_1.2-17 NeuralNetTools_1.5.2
[13] keras_2.2.5.0 vip_0.3.2 doParallel_1.0.16 iterators_1.0.12
[17] foreach_1.4.8 yardstick_0.0.7 workflows_0.2.2 tune_0.1.3
[21] tidyr_1.1.3 tibble_3.1.0 rsample_0.0.9 recipes_0.1.15
[25] purrr_0.3.4 parsnip_0.1.5 modeldata_0.1.0 infer_0.5.4
[29] ggplot2_3.3.3 dials_0.0.9 scales_1.1.0 broom_0.7.5
[33] tidymodels_0.1.2 dplyr_1.0.5 skimr_2.1.3 readxl_1.3.1

loaded via a namespace (and not attached):
[1] Cubist_0.2.3 colorspace_1.4-1 ellipsis_0.3.0 class_7.3-15
[5] fs_1.3.2 base64enc_0.1-3 rstudioapi_0.13 farver_2.0.3
[9] listenv_0.8.0 furrr_0.2.2 ParamHelpers_1.14 earth_5.3.0
[13] prodlim_2019.11.13 fansi_0.4.1 mvtnorm_1.1-1 lubridate_1.7.10
[17] codetools_0.2-16 splines_3.6.1 libcoin_1.0-8 knitr_1.28
[21] zeallot_0.1.0 Formula_1.2-4 jsonlite_1.6 pROC_1.16.1
[25] tfruns_1.4 compiler_3.6.1 backports_1.1.5 assertthat_0.2.1
[29] cli_2.3.1 htmltools_0.4.0 tools_3.6.1 partykit_1.2-13
[33] gtable_0.3.0 glue_1.4.0 RANN_2.6.1 reshape2_1.4.4
[37] parallelMap_1.5.0 fastmatch_1.1-0 Rcpp_1.0.4.6 cellranger_1.1.0
[41] DiceDesign_1.9 nlme_3.1-140 timeDate_3043.102 inum_1.0-3
[45] mlr_2.19.0 gower_0.2.1 xfun_0.22 stringr_1.4.0
[49] globals_0.14.0 lifecycle_1.0.0 future_1.21.0 MASS_7.3-51.4
[53] ipred_0.9-9 BBmisc_1.11 C50_0.1.3.1 reticulate_1.15
[57] gridExtra_2.3 TeachingDemos_2.12 stringi_1.4.3 tensorflow_2.0.0
[61] plotrix_3.7-8 checkmate_2.0.0 butcher_0.1.3 lhs_1.1.1
[65] hardhat_0.1.5 lava_1.6.7 shape_1.4.5 repr_1.1.0
[69] pkgconfig_2.0.3 labeling_0.3 tidyselect_1.1.0 parallelly_1.24.0
[73] plyr_1.8.6 magrittr_2.0.1 R6_2.4.1 generics_0.1.0
[77] DBI_1.1.0 pillar_1.5.1 whisker_0.4 withr_2.4.2
[81] survival_2.44-1.1 nnet_7.3-12 ROSE_0.0-3 crayon_1.4.1
[85] unbalanced_2.0 utf8_1.1.4 usethis_1.6.1 grid_3.6.1
[89] data.table_1.12.8 FNN_1.1.3 ModelMetrics_1.2.2.1 plotmo_3.6.0
[93] digest_0.6.25 stats4_3.6.1 GPfit_1.0-8 munsell_0.5.0

Can't really tell what's going on behind the scenes since I don't know what kind of data you're working with, but here are few things that I usually try when working with tidymodels giving these kinds of errors

  1. try with numerical variables only
  2. give single, exact values instead of tune() in the arguments

Thanks for the reply. What's interesting is that I ran the same code on the same data with the same set.seed a few weeks ago and I didn't receive any errors. See the update in the original post.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.