Hello.

Are there any functional or hacky ways to extend the arguments accepted when specifying a `parsnip`

model? The `ranger::ranger`

function offers the argument of `max.depth`

to control for tree depth. In my work, this has been an important argument to control for over-fitting on imbalanced and noisy data. Currently `parsnip::rand_forest`

using the `ranger::ranger`

engine offers arguments for `mtry`

, `trees`

, and `min_n`

. Any thoughts on how to pass `max.depth`

to `ranger::ranger`

in a `parsnip`

workflow would be great. Thanks!

Matt

# Extending Parsnip `rand_forest` to include `max.depth` argument

`<rant>`

I don't include that parameter since it is antithetical to what random forest does: try to create a *diverse* set of trees. One of the main ways of doing this effectively is to use unpruned trees.

`</rant>`

I'm back... you can do that when you set the engine:

```
library(parsnip)
library(modeldata)
data(concrete)
set.seed(1452)
rand_forest(mtry = 3) %>%
set_engine("ranger") %>%
set_mode("regression") %>%
fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#>
#> Fit time: 374ms
#> Ranger result
#>
#> Call:
#> ranger::ranger(formula = formula, data = data, mtry = ~3, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
#>
#> Type: Regression
#> Number of trees: 500
#> Sample size: 1030
#> Number of independent variables: 8
#> Mtry: 3
#> Target node size: 5
#> Variable importance mode: none
#> Splitrule: variance
#> OOB prediction error (MSE): 21.22822
#> R squared (OOB): 0.9239355
# Extreme example to show that it works:
set.seed(1452)
rand_forest(mtry = 3) %>%
set_engine("ranger", max.depth = 1) %>%
set_mode("regression") %>%
fit(compressive_strength ~ ., data = concrete)
#> parsnip model object
#>
#> Fit time: 44ms
#> Ranger result
#>
#> Call:
#> ranger::ranger(formula = formula, data = data, mtry = ~3, max.depth = ~1, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
#>
#> Type: Regression
#> Number of trees: 500
#> Sample size: 1030
#> Number of independent variables: 8
#> Mtry: 3
#> Target node size: 5
#> Variable importance mode: none
#> Splitrule: variance
#> OOB prediction error (MSE): 191.3124
#> R squared (OOB): 0.3144934
```

^{Created on 2020-02-11 by the reprex package (v0.3.0)}

Thank you @Max! Glad to see that `set_engine()`

can pass additional arguments. That came in handing for `importance = 'purity'`

as well.

Regarding `max.depth`

, I agree with your comment on tree diversity overall.

Thanks!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.