My question refers to features of the packages randomForest, rfPermute and randomForestSRC, in particular to topics on VIMP (and CI). My questions are based on the reading of ch 15 of The Elements of Statistical Learning and the article by Ishwaran and Lu in Statistics in Medecine (2018).
I would like to compare variable importance between variables using random forests. To do so, I wanted to build CI around the estimated mean decrease accuracy and see if they would overlap between variables.
I looked at two approaches, (i) taking the se estimated by randomForest and construct CI using the forumla +/- 1.96*se and (ii) calculating CI following the approach by Ishwaran and Lu. This corresponds to the following commands
(i) results.rf <- randomForest(Y ~ some Xs , data = data, importance = TRUE,
na.action = na.omit)
importance <- as.matrix(results.rf$importanceSD[,3])
importance1 <- as.matrix(importance(results.rf, type = 1, scale= FALSE))
blabla <- importance1 - 1.96importance
blabla1 <- importance1 + 1.96importance
output <- cbind(importance1, importance, blabla, blabla1)
(ii) results_src <- rfsrc(Y ~ some Xs , data = data, importance = TRUE,
na.action = "na.omit", standardize=FALSE)
what <- subsample.rfsrc(results_src, B = 2000, bootstrap = TRUE)
(ii) was specifically designed to build CI and (i) is just an attempt from my side. I am surprised because the picture I get from the two is very different. In particular CI are very large (for significance at 5%) when using (ii). Many variables which clearly "matter" end up having CI that include zero even when I use the 0.16 bootstrap approach with many (2000) bootstrap iterations. There are much larger than when I do (i).
Is it that CI using randomForestSRC are "conservatives" because the main application is to do variable selection among "strong" variables (which are highly predictive)? I come from social sciences when our models are usually not very predictive (in linear regression settings, typically we have low R2, most often below 0.1). If I am willing to make the assumption that the distribution of mean decrease accuracy among OOB is normally distributed, is the (i) approach valid? Are the discrepancies between (i) and (ii) due to some trivial mistake or misunderstanding in my implementation?
Thanks in advance,
BTW, I get something similar to (i) when I do
results_rfpermute <- rfPermute(Y ~ some Xs, data = data, na.action = na.omit)
rp.importance(results_rfpermute, scale = FALSE)