How to remove categorical variables that highly correlated with other categorical variables?

step_corr() can remove highly correlated continuous variables using Pearson or Spearman correlation analysis. However, prefilter functions for categorical variables were not provided in the recipes package. I have 20 columns with categorical variables (using one-hot encoding), and I want to remove redundant columns which were correlated with each other. Anyone can give me some advice? Thanks

If you want top remove the entire predictor you would have to write a custom recipe step to do that.

Alternatively, after you create indicator variables with step_dummy(), step_corr() could be applied to remove levels of the factor(s) that have redundant information in them

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.