Caret preProcess function to impute factor NAs


My question is about the preProcess() argument in Caret package. This argument can use median, knn, or bagImpute.

If a dataset has mixed data (categorical and numerical predictors), and both kinds of predictors have NAs, what does caret do behind the scenes with the categorical/factor variables?

After reading the Caret documentation I think currently Caret ignore the factor variables (at least for standarization). If this is correct, is there no imputation for categorical predictors?

I think mice package does imputation for categorical variables: multinomial logistic regression.

In general terms, is it wise to impute on categorical predictors?, what is the way to follow in the case of caret?

train would assume that you've decomposed into dummy variables unless you are using bagged trees to impute. That kinda stinks so, instead, you could use KNN with recipes and train. That imputation uses Gower distance and would not require that the factor predictors be converted into a numeric encoding.

Yes, absolutely.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.