Extract data sample from a single tree using randomForest

I am using the randomForest command for regression. I was wondering if there is a way to get the data sample for a single tree when using the randomForest package? I am aware that I can get the structure for a single tree using the gettree command. The returned object contains predicted values for the outcome, and I am trying to find the actual outcome so that I can calculate the MSE for that particular tree. I know that the randomForest object contains this information but it would be neat to do is manually as well.

As per GreyMerchant's request, I've included a reprex below. I use the airquality data set included with R.

require(randomForest)

# Make this example reproducible
set.seed(1)

# Fit the random forest model
model <- randomForest(
  formula = Ozone ~ .,
  data = airquality
)

# Get first tree
tree_1 <- gettree(model, k = 1)

Now I would like to somehow get the original data that was used to make predictions for tree number 1. For normal regressions I would simply use the data from the data argument, but from my understanding , a Random Forest model will fit trees using sub-samples of the original data.

I want the original data so that I can manually calculate the output from model$mse.

Hello there,

It is going to be a lot easier if you can provide a simple example/reprex and then we can take it from there. See here: FAQ: How to do a minimal reproducible example ( reprex ) for beginners

I managed to find a thread on Stack Overflow that helped solve my problem. Long story short, I can set the keep.inbag argument to TRUE in the randomForest() function to keep track of which samples are "in-bag" in which trees.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.