Hello.
Are there any functional or hacky ways to extend the arguments accepted when specifying a parsnip
model? The ranger::ranger
function offers the argument of max.depth
to control for tree depth. In my work, this has been an important argument to control for over-fitting on imbalanced and noisy data. Currently parsnip::rand_forest
using the ranger::ranger
engine offers arguments for mtry
, trees
, and min_n
. Any thoughts on how to pass max.depth
to ranger::ranger
in a parsnip
workflow would be great. Thanks!
Matt
<rant>
I don't include that parameter since it is antithetical to what random forest does: try to create a diverse set of trees. One of the main ways of doing this effectively is to use unpruned trees.
</rant>
I'm back... you can do that when you set the engine:
library(parsnip)
library(modeldata)
data(concrete)
set.seed(1452)
rand_forest(mtry = 3) %>%
set_engine("ranger") %>%
set_mode("regression") %>%
fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#>
#> Fit time: 374ms
#> Ranger result
#>
#> Call:
#> ranger::ranger(formula = formula, data = data, mtry = ~3, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
#>
#> Type: Regression
#> Number of trees: 500
#> Sample size: 1030
#> Number of independent variables: 8
#> Mtry: 3
#> Target node size: 5
#> Variable importance mode: none
#> Splitrule: variance
#> OOB prediction error (MSE): 21.22822
#> R squared (OOB): 0.9239355
# Extreme example to show that it works:
set.seed(1452)
rand_forest(mtry = 3) %>%
set_engine("ranger", max.depth = 1) %>%
set_mode("regression") %>%
fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#>
#> Fit time: 44ms
#> Ranger result
#>
#> Call:
#> ranger::ranger(formula = formula, data = data, mtry = ~3, max.depth = ~1, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
#>
#> Type: Regression
#> Number of trees: 500
#> Sample size: 1030
#> Number of independent variables: 8
#> Mtry: 3
#> Target node size: 5
#> Variable importance mode: none
#> Splitrule: variance
#> OOB prediction error (MSE): 191.3124
#> R squared (OOB): 0.3144934
Created on 2020-02-11 by the reprex package (v0.3.0)
Thank you @Max! Glad to see that set_engine()
can pass additional arguments. That came in handing for importance = 'purity'
as well.
Regarding max.depth
, I agree with your comment on tree diversity overall.
Thanks!
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.