Random forest with categorical variables

In running a random forest model with the random forest package, do independent categorical variables need to be converted to factors? Does the dependent categorical variable need to be converted to a factor?

I would think the dependent variable does. I'm not as sure about the independent variables. Thank you.

Hello @fcas80,

You can see here that they do in fact convert the dependent categorical variable to a factor in the vignette here: https://htmlpreview.github.io/?https://github.com/geneticsMiNIng/BlackBoxOpener/blob/master/randomForestExplainer/inst/doc/randomForestExplainer.html

I would assume you will have to do the same for the independent variables as it otherwise won't know that it should treat it in that way.

Yes for the outcome data. It slightly depends on the implementation but most of those that I have seem require a factor.

For the predictors (independent variables), that's up to you. This is extensively discussed here. For example:

1 Like

GreyMerchant, thanks for your reply. In the vignette you shared, the author does convert the dependent variable to a factor, but he does not convert any independent variables to factors.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.