Extremely Randomized Trees in Random Forest

Hi,
Is it possible to run Extremely Randomized Trees within tidymodelsusing either ranger or randomForest as engines? I cant find it as a method parameter.

Thanks in advance,
John

1 Like

You pass it in as an engine-specific parameter. The last update to tune allowed for this; see this blog post. [EDIT - wrong blog post]

Here's an example for fitting and tuning the model:

library(tidymodels)
#> ── Attaching packages ─────────────────────────────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0      ✓ recipes   0.1.13
#> ✓ dials     0.0.8      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.1      ✓ tibble    3.0.3 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.1 
#> ✓ infer     0.5.2      ✓ tune      0.1.1 
#> ✓ modeldata 0.0.2      ✓ workflows 0.1.2 
#> ✓ parsnip   0.1.3      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()
data(ad_data)

set.seed(23923)
folds <- vfold_cv(ad_data)

rf_spec <- 
  rand_forest() %>% 
  set_engine("ranger", splitrule = "extratrees") %>% 
  set_mode("classification")

set.seed(3892)
rf_spec %>% 
  fit(Class ~ ., data = ad_data)
#> parsnip model object
#> 
#> Fit time:  242ms 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(formula = Class ~ ., data = data, splitrule = ~"extratrees",      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1), probability = TRUE) 
#> 
#> Type:                             Probability estimation 
#> Number of trees:                  500 
#> Sample size:                      333 
#> Number of independent variables:  130 
#> Mtry:                             11 
#> Target node size:                 10 
#> Variable importance mode:         none 
#> Splitrule:                        extratrees 
#> Number of random splits:          1 
#> OOB prediction error (Brier s.):  0.1477684

rf_spec <- 
  rand_forest() %>% 
  set_engine("ranger", splitrule = tune()) %>% 
  set_mode("classification")

# Tune by passing in the values via a grid:
set.seed(18)
tune_res <-
  rf_spec %>%
  tune_grid(
    Class ~ .,
    resamples = folds,
    # Use classification splitting rules only
    grid = tibble(splitrule = c("gini", "extratrees", "hellinger"))
  )
collect_metrics(tune_res)
#> # A tibble: 6 x 7
#>   splitrule  .metric  .estimator  mean     n std_err .config
#>   <chr>      <chr>    <chr>      <dbl> <int>   <dbl> <chr>  
#> 1 gini       accuracy binary     0.819    10  0.0237 Model1 
#> 2 gini       roc_auc  binary     0.880    10  0.0365 Model1 
#> 3 extratrees accuracy binary     0.768    10  0.0225 Model2 
#> 4 extratrees roc_auc  binary     0.862    10  0.0375 Model2 
#> 5 hellinger  accuracy binary     0.822    10  0.0221 Model3 
#> 6 hellinger  roc_auc  binary     0.876    10  0.0341 Model3

# Tune by passing a parameter set (also usefule for Bayesian opt)

# Automatically uses this dials function:
dials::splitting_rule()
#> Splitting Rule  (qualitative)
#> 7 possible value include:
#> 'variance', 'extratrees', 'maxstat', 'beta', 'gini', 'extratrees' and 'hellin...

# But we need to update the possible values
class_rules <- splitting_rule(values = c("gini", "extratrees", "hellinger"))

rf_param <- 
  rf_spec %>% 
  parameters() %>% 
  update(splitrule = class_rules)

set.seed(18)
tune_res <-
  rf_spec %>%
  tune_grid(
    Class ~ .,
    resamples = folds,
    param_info = rf_param
  )
collect_metrics(tune_res)
#> # A tibble: 6 x 7
#>   splitrule  .metric  .estimator  mean     n std_err .config
#>   <chr>      <chr>    <chr>      <dbl> <int>   <dbl> <chr>  
#> 1 extratrees accuracy binary     0.780    10  0.0185 Model1 
#> 2 extratrees roc_auc  binary     0.866    10  0.0415 Model1 
#> 3 hellinger  accuracy binary     0.834    10  0.0254 Model2 
#> 4 hellinger  roc_auc  binary     0.884    10  0.0333 Model2 
#> 5 gini       accuracy binary     0.816    10  0.0272 Model3 
#> 6 gini       roc_auc  binary     0.877    10  0.0381 Model3

Created on 2020-08-09 by the reprex package (v0.3.0)

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.