Problem with Dial

Hi, I'm trying to understand default grid values in rand_forest function but I get a an error , I wonder if the outcome in next code it's a bug or I'm missing something:

Thank you in advanced

library(tidymodels)
 
 model <- rand_forest(
   mode       = "regression",
   mtry = tune(),
   trees = tune()) %>%
   set_engine(engine = "ranger")
 grid_latin_hypercube(parameters(model), size = 100)
Error: These arguments contains unknowns: `mtry`. See the `finalize()` function.
Run `rlang::last_error()` to see where the error occurred.

My session is:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Colombia.1252  LC_CTYPE=Spanish_Colombia.1252   
[3] LC_MONETARY=Spanish_Colombia.1252 LC_NUMERIC=C                     
[5] LC_TIME=Spanish_Colombia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] yardstick_0.0.8    workflowsets_0.0.2 workflows_0.2.2    tune_0.1.5        
 [5] tidyr_1.1.3        tibble_3.1.2       rsample_0.1.0      recipes_0.1.16    
 [9] purrr_0.3.4        parsnip_0.1.6      modeldata_0.1.0    infer_0.5.4       
[13] ggplot2_3.3.3      dplyr_1.0.6        dials_0.0.9        scales_1.1.1      
[17] broom_0.7.6        tidymodels_0.1.3  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6         lubridate_1.7.10   lattice_0.20-44    listenv_0.8.0     
 [5] class_7.3-19       assertthat_0.2.1   digest_0.6.27      ipred_0.9-11      
 [9] foreach_1.5.1      utf8_1.2.1         parallelly_1.25.0  R6_2.5.0          
[13] plyr_1.8.6         backports_1.2.1    pillar_1.6.1       rlang_0.4.11      
[17] rstudioapi_0.13    DiceDesign_1.9     furrr_0.2.2        rpart_4.1-15      
[21] Matrix_1.3-3       splines_4.1.0      gower_0.2.2        munsell_0.5.0     
[25] compiler_4.1.0     pkgconfig_2.0.3    globals_0.14.0     nnet_7.3-16       
[29] tidyselect_1.1.1   prodlim_2019.11.13 codetools_0.2-18   GPfit_1.0-8       
[33] fansi_0.5.0        future_1.21.0      crayon_1.4.1       withr_2.4.2       
[37] MASS_7.3-54        grid_4.1.0         gtable_0.3.0       lifecycle_1.0.0   
[41] DBI_1.1.1          magrittr_2.0.1     pROC_1.17.0.1      cli_2.5.0         
[45] timeDate_3043.102  ellipsis_0.3.2     lhs_1.1.1          generics_0.1.0    
[49] vctrs_0.3.8        lava_1.6.9         iterators_1.0.13   tools_4.1.0       
[53] glue_1.4.2         parallel_4.1.0     survival_3.2-11    colorspace_2.0-1*emphasized text*

The documentation for that grid function lists the inputs as:

One or more param objects (such as mtry() or penalty() ). None of the objects can have unknown() values in the parameter ranges or values.

In your example, you are tuning over mtry. This depends on the number of columns in your data set so there is no known upper limit:

> mtry()
# Randomly Selected Predictors (quantitative)
Range: [1, ?]

The error message points to finalize() and there are examples in that help topic.

There is a lot more about this stuff in Tidy Models with R.

Let's say that you were modeling the mtcars data set:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

model <-
   rand_forest(mode = "regression",
               mtry = tune(),
               trees = tune()) %>%
   set_engine(engine = "ranger")

# Get rid of the unknown

rf_param <- 
   model %>% 
   parameters() 

# mtry still needs to be finalized:
rf_param
#> Collection of 2 parameters for tuning
#> 
#>  identifier  type    object
#>        mtry  mtry nparam[?]
#>       trees trees nparam[+]
#> 
#> Model parameters needing finalization:
#>    # Randomly Selected Predictors ('mtry')
#> 
#> See `?dials::finalize` or `?dials::update.parameters` for more information.

rf_param <- 
   rf_param %>% 
   # Give it the predictors to finalize mtry
   finalize(x = mtcars %>% select(-mpg))
rf_param
#> Collection of 2 parameters for tuning
#> 
#>  identifier  type    object
#>        mtry  mtry nparam[+]
#>       trees trees nparam[+]

rf_param %>% 
   grid_latin_hypercube(size = 3)
#> # A tibble: 3 x 2
#>    mtry trees
#>   <int> <int>
#> 1    10   167
#> 2     2  1565
#> 3     6  1159

Created on 2021-07-01 by the reprex package (v2.0.0)

1 Like

Thank you dear Max, with your help I did it:

car_pred <- select(mtcars, -mpg)

model <- rand_forest(
  mode       = "regression",
  mtry = tune(),
  trees = tune()) %>%
  set_engine(engine = "ranger") 

set.seed(12345)
print(model %>% parameters() %>% finalize(car_pred) %>%
        grid_latin_hypercube(size = 10) %>% arrange(mtry))

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.