I'm using the randomforest package to classify my dataset of continuous variables grouped into four categories (a, b, c, d). Using the varImp() and importance() functions I can get the class-specific and overall permutation variable importance. I would like to find out which of my variables provide the greatest separation between two and two groups - i.e. which variables have the highest variable importance for separating between group a and group c. I've come across the varImpGroup() function in RFgroove, but this seems to be directed at variable importance for a subset of variables instead of groups.

Is there a way to produce the variables importance for pairwise comparisons of groups, without having to run the classification separately for only two groups at a time?

Update: For anyone wondering, I contacted Andy Liaw who manages the randomforest package. He suggested using the localImp = TRUE argument to produce a matrix of casewise variable importance, and average the values across all the individuals composing the selected groups to determine between-group variable importance.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.