Tuning Baguette models with tidymodels

I was following the guide here https://www.tidymodels.org/start/tuning/ on tuning models using tidymodels but wanted to try it using a bagged tree model. However, when I try and tune the bagged tree model I get the warning message:

Warning message: All models failed in tune_grid(). See the .notes column.

In the notes column each entry is:

"internal: Error in rlang::env_get(mod_env, items): argument \"default\" is missing, with no default"

The only thing different about my code from the guide is the model type and the model will fit if I specify the parameters directly, but I am unable to tune the model and I'm not sure why. Nor can I find any posts of others having a similar problem with baguette or rpart using tune_grid.

library(baguette)

bag_spec <- 
  bag_tree(tree_depth = tune()) %>%
  set_mode("regression") %>%
  set_engine("rpart", times = 25)

bag_grid <- grid_regular(
  tree_depth(),
  levels = 10
)

bag_wf <- workflow() %>%
  add_formula(QUAL_SCORE_y0  ~ .) %>%
  add_model(bag_spec)

vb_folds <- vfold_cv(df_training)

doParallel::registerDoParallel()

bag_res <- tune_grid(
  bag_wf,
  resamples = vb_folds,
  grid = bag_grid
)

1 Like

There is currently a bug that is halfway squashed. It is related to using PSOCK clusters (e.g. doParallel::registerDoParallel()).

What OS are you on?

1 Like

OS is Windows 10, I'll try the same code without the doParallel call and see if it works

1 Like

Can you run remotes::install_dev("baguette") and see if it works then?

So I tried this and took the doParallel::registerDoParallel() call off of my code and encountered a new error.

When I try and fit one bagged tree using:

bag_spec <- 
  bag_tree(tree_depth = 5) %>%
  set_mode("regression") %>%
  set_engine("rpart", times = 25) %>%
  fit(QUAL_SCORE_y0  ~ ., data = df_training)

tidymodels fits the model without error, but when I try and use the above code for any parameter tuning I get:

x Fold01: model 2/25: Error: All of the models failed. An example message was:
Error in [.data.frame(m, labs) : undefined columns selected

I have tried the solution found here to no avail: classification - R caret rpart returns Error in `[.data.frame`(m, labs) : undefined columns selected - Stack Overflow

It's hard to know if this is a code issue or a package issue. Can you run this reprex?

library(tidymodels)
library(baguette)
library(doParallel)

registerDoParallel()

bagged <-
  bag_tree(cost_complexity = tune()) %>%
  set_engine("rpart", times = 5) %>%
  set_mode("regression")

set.seed(1)
folds <- vfold_cv(mtcars)

set.seed(2)
tuned <-
  bagged %>%
  tune_grid(mpg ~ ., folds, grid = 3)

tuned
 registerDoParallel()
 
 bagged <-
+   bag_tree(cost_complexity = tune()) %>%
+   set_engine("rpart", times = 5) %>%
+   set_mode("regression")
 
 set.seed(1)
 folds <- vfold_cv(mtcars)

set.seed(2)
 tuned <-
+   bagged %>%
+   tune_grid(mpg ~ ., folds, grid = 3)
Warning message:
All models failed in tune_grid(). See the `.notes` column. 
 
 tuned
# Tuning results
# 10-fold cross-validation 
# A tibble: 10 x 4
   splits         id     .metrics .notes          
   <list>         <chr>  <list>   <list>          
 1 <split [28/4]> Fold01 <NULL>   <tibble [1 x 1]>
 2 <split [28/4]> Fold02 <NULL>   <tibble [1 x 1]>
 3 <split [29/3]> Fold03 <NULL>   <tibble [1 x 1]>
 4 <split [29/3]> Fold04 <NULL>   <tibble [1 x 1]>
 5 <split [29/3]> Fold05 <NULL>   <tibble [1 x 1]>
 6 <split [29/3]> Fold06 <NULL>   <tibble [1 x 1]>
 7 <split [29/3]> Fold07 <NULL>   <tibble [1 x 1]>
 8 <split [29/3]> Fold08 <NULL>   <tibble [1 x 1]>
 9 <split [29/3]> Fold09 <NULL>   <tibble [1 x 1]>
10 <split [29/3]> Fold10 <NULL>   <tibble [1 x 1]>
Warning message:
This tuning result has notes. Example notes on model fitting include:
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default
internal: Error in rlang::env_get(mod_env, items): argument "default" is missing, with no default 

Ok. How about, after loading all of the packages, run sessioninfo::session_info()?

Sure no problem:

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.15   iterators_1.0.12    foreach_1.5.0       baguette_0.0.1.9000 yardstick_0.0.7     workflows_0.2.0     tune_0.1.1          tidyr_1.1.2         tibble_3.0.3        rsample_0.0.8       recipes_0.1.13      purrr_0.3.4         parsnip_0.1.3      
[14] modeldata_0.0.2     infer_0.5.3         ggplot2_3.3.2       dplyr_1.0.2         dials_0.0.9         scales_1.1.1        broom_0.7.0         tidymodels_0.1.1   

loaded via a namespace (and not attached):
 [1] splines_4.0.2      prodlim_2019.11.13 Formula_1.2-3      assertthat_0.2.1   GPfit_1.0-8        globals_0.13.0     ipred_0.9-9        pillar_1.4.6       backports_1.1.10   lattice_0.20-41    glue_1.4.2         pROC_1.16.2        digest_0.6.25      hardhat_0.1.4     
[15] colorspace_1.4-1   Matrix_1.2-18      plyr_1.8.6         timeDate_3043.102  pkgconfig_2.0.3    lhs_1.1.0          DiceDesign_1.8-1   earth_5.2.0        listenv_0.8.0      mvtnorm_1.1-1      gower_0.2.2        lava_1.6.8         Cubist_0.2.3       TeachingDemos_2.12
[29] generics_0.0.2     ellipsis_0.3.1     withr_2.3.0        furrr_0.1.0        nnet_7.3-14        cli_2.0.2          survival_3.1-12    magrittr_1.5       crayon_1.3.4       future_1.19.1      fansi_0.4.1        MASS_7.3-51.6      class_7.3-17       tools_4.0.2       
[43] lifecycle_0.2.0    stringr_1.4.0      munsell_0.5.0      plotrix_3.7-8      compiler_4.0.2     inum_1.0-1         rlang_0.4.7        plotmo_3.6.0       grid_4.0.2         rstudioapi_0.11    C50_0.1.3.1        partykit_1.2-9     gtable_0.3.0       codetools_0.2-16  
[57] reshape2_1.4.4     R6_2.4.1           lubridate_1.7.9    libcoin_1.0-6      stringi_1.5.3      Rcpp_1.0.5         vctrs_0.3.4        rpart_4.1-15       tidyselect_1.1.0 

This gives a lot more information:

Sorry about that. Here it is:

- Session info -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Los_Angeles         
 date     2020-10-06                  

- Packages -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 package       * version    date       lib source                              
 assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.0.2)                      
 backports       1.1.10     2020-09-15 [1] CRAN (R 4.0.2)                      
 baguette      * 0.0.1.9000 2020-10-06 [1] Github (tidymodels/baguette@25ad7af)
 broom         * 0.7.0      2020-07-09 [1] CRAN (R 4.0.2)                      
 C50             0.1.3.1    2020-05-26 [1] CRAN (R 4.0.2)                      
 class           7.3-17     2020-04-26 [2] CRAN (R 4.0.2)                      
 cli             2.0.2      2020-02-28 [1] CRAN (R 4.0.2)                      
 codetools       0.2-16     2018-12-24 [2] CRAN (R 4.0.2)                      
 colorspace      1.4-1      2019-03-18 [1] CRAN (R 4.0.2)                      
 crayon          1.3.4      2017-09-16 [1] CRAN (R 4.0.2)                      
 Cubist          0.2.3      2020-01-10 [1] CRAN (R 4.0.2)                      
 dials         * 0.0.9      2020-09-16 [1] CRAN (R 4.0.2)                      
 DiceDesign      1.8-1      2019-07-31 [1] CRAN (R 4.0.2)                      
 digest          0.6.25     2020-02-23 [1] CRAN (R 4.0.2)                      
 doParallel    * 1.0.15     2019-08-02 [1] CRAN (R 4.0.2)                      
 dplyr         * 1.0.2      2020-08-18 [1] CRAN (R 4.0.2)                      
 earth           5.2.0      2020-09-16 [1] CRAN (R 4.0.2)                      
 ellipsis        0.3.1      2020-05-15 [1] CRAN (R 4.0.2)                      
 fansi           0.4.1      2020-01-08 [1] CRAN (R 4.0.2)                      
 foreach       * 1.5.0      2020-03-30 [1] CRAN (R 4.0.2)                      
 Formula         1.2-3      2018-05-03 [1] CRAN (R 4.0.0)                      
 furrr           0.1.0      2018-05-16 [1] CRAN (R 4.0.2)                      
 future          1.19.1     2020-09-22 [1] CRAN (R 4.0.2)                      
 generics        0.0.2      2018-11-29 [1] CRAN (R 4.0.2)                      
 ggplot2       * 3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                      
 globals         0.13.0     2020-09-17 [1] CRAN (R 4.0.2)                      
 glue            1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                      
 gower           0.2.2      2020-06-23 [1] CRAN (R 4.0.2)                      
 GPfit           1.0-8      2019-02-08 [1] CRAN (R 4.0.2)                      
 gtable          0.3.0      2019-03-25 [1] CRAN (R 4.0.2)                      
 hardhat         0.1.4      2020-07-02 [1] CRAN (R 4.0.2)                      
 infer         * 0.5.3      2020-07-14 [1] CRAN (R 4.0.2)                      
 inum            1.0-1      2019-04-25 [1] CRAN (R 4.0.2)                      
 ipred           0.9-9      2019-04-28 [1] CRAN (R 4.0.2)                      
 iterators     * 1.0.12     2019-07-26 [1] CRAN (R 4.0.2)                      
 lattice         0.20-41    2020-04-02 [2] CRAN (R 4.0.2)                      
 lava            1.6.8      2020-09-26 [1] CRAN (R 4.0.2)                      
 lhs             1.1.0      2020-09-29 [1] CRAN (R 4.0.2)                      
 libcoin         1.0-6      2020-08-14 [1] CRAN (R 4.0.2)                      
 lifecycle       0.2.0      2020-03-06 [1] CRAN (R 4.0.2)                      
 listenv         0.8.0      2019-12-05 [1] CRAN (R 4.0.2)                      
 lubridate       1.7.9      2020-06-08 [1] CRAN (R 4.0.2)                      
 magrittr        1.5        2014-11-22 [1] CRAN (R 4.0.2)                      
 MASS            7.3-51.6   2020-04-26 [2] CRAN (R 4.0.2)                      
 Matrix          1.2-18     2019-11-27 [2] CRAN (R 4.0.2)                      
 modeldata     * 0.0.2      2020-06-22 [1] CRAN (R 4.0.2)                      
 munsell         0.5.0      2018-06-12 [1] CRAN (R 4.0.2)                      
 mvtnorm         1.1-1      2020-06-09 [1] CRAN (R 4.0.0)                      
 nnet            7.3-14     2020-04-26 [2] CRAN (R 4.0.2)                      
 parsnip       * 0.1.3      2020-08-04 [1] CRAN (R 4.0.2)                      
 partykit        1.2-9      2020-07-10 [1] CRAN (R 4.0.2)                      
 pillar          1.4.6      2020-07-10 [1] CRAN (R 4.0.2)                      
 pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.0.2)                      
 plotmo          3.6.0      2020-09-13 [1] CRAN (R 4.0.2)                      
 plotrix         3.7-8      2020-04-16 [1] CRAN (R 4.0.0)                      
 plyr            1.8.6      2020-03-03 [1] CRAN (R 4.0.2)                      
 pROC            1.16.2     2020-03-19 [1] CRAN (R 4.0.2)                      
 prodlim         2019.11.13 2019-11-17 [1] CRAN (R 4.0.2)                      
 purrr         * 0.3.4      2020-04-17 [1] CRAN (R 4.0.2)                      
 R6              2.4.1      2019-11-12 [1] CRAN (R 4.0.2)                      
 Rcpp            1.0.5      2020-07-06 [1] CRAN (R 4.0.2)                      
 recipes       * 0.1.13     2020-06-23 [1] CRAN (R 4.0.2)                      
 reshape2        1.4.4      2020-04-09 [1] CRAN (R 4.0.2)                      
 rlang           0.4.7      2020-07-09 [1] CRAN (R 4.0.2)                      
 rpart           4.1-15     2019-04-12 [1] CRAN (R 4.0.2)                      
 rsample       * 0.0.8      2020-09-23 [1] CRAN (R 4.0.2)                      
 rstudioapi      0.11       2020-02-07 [1] CRAN (R 4.0.2)                      
 scales        * 1.1.1      2020-05-11 [1] CRAN (R 4.0.2)                      
 sessioninfo     1.1.1      2018-11-05 [1] CRAN (R 4.0.2)                      
 stringi         1.5.3      2020-09-09 [1] CRAN (R 4.0.2)                      
 stringr         1.4.0      2019-02-10 [1] CRAN (R 4.0.2)                      
 survival        3.1-12     2020-04-10 [2] CRAN (R 4.0.2)                      
 TeachingDemos   2.12       2020-04-07 [1] CRAN (R 4.0.2)                      
 tibble        * 3.0.3      2020-07-10 [1] CRAN (R 4.0.2)                      
 tidymodels    * 0.1.1      2020-07-14 [1] CRAN (R 4.0.2)                      
 tidyr         * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)                      
 tidyselect      1.1.0      2020-05-11 [1] CRAN (R 4.0.2)                      
 timeDate        3043.102   2018-02-21 [1] CRAN (R 4.0.2)                      
 tune          * 0.1.1      2020-07-08 [1] CRAN (R 4.0.2)                      
 vctrs           0.3.4      2020-08-29 [1] CRAN (R 4.0.2)                      
 withr           2.3.0      2020-09-22 [1] CRAN (R 4.0.2)                      
 workflows     * 0.2.0      2020-09-15 [1] CRAN (R 4.0.2)                      
 yardstick     * 0.0.7      2020-07-13 [1] CRAN (R 4.0.2) 

Thanks! How about using remotes::install_dev("tune") and try again.

The reprex you provided runs just fine now, but on my dataframe I still get the following error:

x Fold01: model  1/25: Error: All of the models failed. An example message was:
  Error in `[.data.frame`(m, labs) : undefined columns selected

It seems to have a problem with my column names but I can't think why that would be the case and only happen during tune, not when a single bag model is trained. Here are the column names:

 [1] "HQ"              "Facility_Type"        "QUAL_SCORE_y0"        "QUAL_SCORE_y1"        "QUAL_SCORE_y2"        "QUAL_SCORE_y3"        "ERV_y0"               "ERV_y1"               "ERV_y2"               "ERV_y3"               "Rating_Method_y0"     "Rating_Method_y1"    
[13] "Rating_Method_y2"     "Rating_Method_y3"     "SUST_ACF"             "SUST_RQMT"            "PRORATED_SERVICE_PRV" "FSM_CALC_PRV"         "Facility_Age_yrs"     "sc_chg_1"             "sc_chg_2"             "all_score"            "EVR_chg_0"            "EVR_chg_1"           
[25] "EVR_chg_2"            "all_erv" 

I am getting exactly the same error. Earlier on a custom model was working perfectly fine, now I am suddenly getting this error. I haven't created a reprex. My first thought was to search online for the error and discovered that this is a current ongoing conversation (ie. within the last few minutes).

I don't believe any changes I made to my code should have caused the error. And I don't think I updated tune or any of the tidymodel packages since it was working, but it is possible that I did. I am running on Linux not Windows and using the github version of tidymodels.

I don't see anything wrong with those names. Does it run sequentially?

The code doesn't run sequentially or in parallel, but the model will run when not tuning, i.e. if I specify tree_depth and then pipe to fit rather than tuning.
Here is an example of the errors from the .notes column:

I don't know that I can help more without being able to run an example locally that has the same issue.

I think the problem is that allow_sparse_x is now a required parameter in the options of set_encoding when defining a custom model, and doesn't have a default value. It is late at night, so I aren't going to investigate this further, but I think this may be the cause of the error, in my case anyway.

It's a frustrating error because it only occurs when using tune and not when fitting a single model. I suppose I could make my own wrapper to search the hyperparameters but that defeats one of the main reasons for switching to tidymodels.

I even tried

library(janitor)
df <- clean_names(df)

as well as stripping out "_" and all numbers from column names but nothing works. There appears to be something wrong with how tune and rpart are interacting with column names but I have not found a solution so far.

FWIW I am seeing the same original error (rlang::env_get......) when trying to run naive_Bayes() model in parallel. Runs without the error sequential. (Ubuntu)