Extending Parsnip `rand_forest` to include `max.depth` argument

Hello.
Are there any functional or hacky ways to extend the arguments accepted when specifying a parsnip model? The ranger::ranger function offers the argument of max.depth to control for tree depth. In my work, this has been an important argument to control for over-fitting on imbalanced and noisy data. Currently parsnip::rand_forest using the ranger::ranger engine offers arguments for mtry, trees, and min_n. Any thoughts on how to pass max.depth to ranger::ranger in a parsnip workflow would be great. Thanks!
Matt

<rant>

I don't include that parameter since it is antithetical to what random forest does: try to create a diverse set of trees. One of the main ways of doing this effectively is to use unpruned trees.

</rant>

I'm back... you can do that when you set the engine:

library(parsnip)
library(modeldata)
data(concrete)

set.seed(1452)
rand_forest(mtry = 3) %>% 
  set_engine("ranger") %>% 
  set_mode("regression") %>% 
  fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#> 
#> Fit time:  374ms 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(formula = formula, data = data, mtry = ~3, num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  500 
#> Sample size:                      1030 
#> Number of independent variables:  8 
#> Mtry:                             3 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       21.22822 
#> R squared (OOB):                  0.9239355

# Extreme example to show that it works:
set.seed(1452)
rand_forest(mtry = 3) %>% 
  set_engine("ranger", max.depth = 1) %>% 
  set_mode("regression") %>% 
  fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#> 
#> Fit time:  44ms 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(formula = formula, data = data, mtry = ~3, max.depth = ~1,      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  500 
#> Sample size:                      1030 
#> Number of independent variables:  8 
#> Mtry:                             3 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       191.3124 
#> R squared (OOB):                  0.3144934

Created on 2020-02-11 by the reprex package (v0.3.0)

Thank you @Max! Glad to see that set_engine() can pass additional arguments. That came in handing for importance = 'purity' as well.
Regarding max.depth, I agree with your comment on tree diversity overall.
Thanks!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.