Hi,
Is it possible to run Extremely Randomized Trees
within tidymodels
using either ranger
or randomForest
as engines? I cant find it as a method parameter.
Thanks in advance,
John
Hi,
Is it possible to run Extremely Randomized Trees
within tidymodels
using either ranger
or randomForest
as engines? I cant find it as a method parameter.
Thanks in advance,
John
You pass it in as an engine-specific parameter. The last update to tune
allowed for this; see this blog post. [EDIT - wrong blog post]
Here's an example for fitting and tuning the model:
library(tidymodels)
#> ── Attaching packages ─────────────────────────────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom 0.7.0 ✓ recipes 0.1.13
#> ✓ dials 0.0.8 ✓ rsample 0.0.7
#> ✓ dplyr 1.0.1 ✓ tibble 3.0.3
#> ✓ ggplot2 3.3.2 ✓ tidyr 1.1.1
#> ✓ infer 0.5.2 ✓ tune 0.1.1
#> ✓ modeldata 0.0.2 ✓ workflows 0.1.2
#> ✓ parsnip 0.1.3 ✓ yardstick 0.0.7
#> ✓ purrr 0.3.4
#> ── Conflicts ────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x recipes::step() masks stats::step()
data(ad_data)
set.seed(23923)
folds <- vfold_cv(ad_data)
rf_spec <-
rand_forest() %>%
set_engine("ranger", splitrule = "extratrees") %>%
set_mode("classification")
set.seed(3892)
rf_spec %>%
fit(Class ~ ., data = ad_data)
#> parsnip model object
#>
#> Fit time: 242ms
#> Ranger result
#>
#> Call:
#> ranger::ranger(formula = Class ~ ., data = data, splitrule = ~"extratrees", num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE)
#>
#> Type: Probability estimation
#> Number of trees: 500
#> Sample size: 333
#> Number of independent variables: 130
#> Mtry: 11
#> Target node size: 10
#> Variable importance mode: none
#> Splitrule: extratrees
#> Number of random splits: 1
#> OOB prediction error (Brier s.): 0.1477684
rf_spec <-
rand_forest() %>%
set_engine("ranger", splitrule = tune()) %>%
set_mode("classification")
# Tune by passing in the values via a grid:
set.seed(18)
tune_res <-
rf_spec %>%
tune_grid(
Class ~ .,
resamples = folds,
# Use classification splitting rules only
grid = tibble(splitrule = c("gini", "extratrees", "hellinger"))
)
collect_metrics(tune_res)
#> # A tibble: 6 x 7
#> splitrule .metric .estimator mean n std_err .config
#> <chr> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 gini accuracy binary 0.819 10 0.0237 Model1
#> 2 gini roc_auc binary 0.880 10 0.0365 Model1
#> 3 extratrees accuracy binary 0.768 10 0.0225 Model2
#> 4 extratrees roc_auc binary 0.862 10 0.0375 Model2
#> 5 hellinger accuracy binary 0.822 10 0.0221 Model3
#> 6 hellinger roc_auc binary 0.876 10 0.0341 Model3
# Tune by passing a parameter set (also usefule for Bayesian opt)
# Automatically uses this dials function:
dials::splitting_rule()
#> Splitting Rule (qualitative)
#> 7 possible value include:
#> 'variance', 'extratrees', 'maxstat', 'beta', 'gini', 'extratrees' and 'hellin...
# But we need to update the possible values
class_rules <- splitting_rule(values = c("gini", "extratrees", "hellinger"))
rf_param <-
rf_spec %>%
parameters() %>%
update(splitrule = class_rules)
set.seed(18)
tune_res <-
rf_spec %>%
tune_grid(
Class ~ .,
resamples = folds,
param_info = rf_param
)
collect_metrics(tune_res)
#> # A tibble: 6 x 7
#> splitrule .metric .estimator mean n std_err .config
#> <chr> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 extratrees accuracy binary 0.780 10 0.0185 Model1
#> 2 extratrees roc_auc binary 0.866 10 0.0415 Model1
#> 3 hellinger accuracy binary 0.834 10 0.0254 Model2
#> 4 hellinger roc_auc binary 0.884 10 0.0333 Model2
#> 5 gini accuracy binary 0.816 10 0.0272 Model3
#> 6 gini roc_auc binary 0.877 10 0.0381 Model3
Created on 2020-08-09 by the reprex package (v0.3.0)
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.