I sometimes use contr.sum
factor coding to make the main effect term more intuitive in linear regression models with interaction terms.
Is there any benefit to using contr.sum
coding in regression tree (xgboost
) models? Does it have any effect in how variable importance or Shap values are interpreted?
https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/