Hi, I'm trying to find the optimal "mtry" value for a Random Forest classifier on a given set of training and test data. I want to plot the error rate on the test data and oob error on the raining data for each value of "mtry" but the plot won't show the line corresponding to the oob.error, does anyone know what mistake i've made?
oob.err = double(10)
test.err = double(10)
for(mtry in 1:10)
{
model = randomForest(train[,1]~., data = train[,-1], mtry=mtry, ntree = 400)
oob.err[mtry] = model$err.rate[400,1]
pred = predict(model, test[,-1])
test.err[mtry] = with(test[,-1], mean( (test[,1]-pred)^2 ))
}
Hi @Yarnabrina, my apologies. The code I used for the plot is
matplot(1:mtry, cbind(test.err, oob.err), pch = 23, col = c("red", "blue"), type = "b", ylab="Mean Squared
Error")
legend("topright", legend = c("OOB", "Test"), pch = 23, col = c("red", "blue"))
and the datasets train and test are subsets of an original dataset below that contains observations of human (positive) and bacteriophage (negative) DNA sequences (with 100+ variables, i've only included the first 20 or so)
546 neg A G A C G C G C T T G A A C C T G A T G T T C C T
172 pos T C A G G T G A T C T A C C C A C C T T G G C C T
147 pos C T C T C T A G T G G T C A G T G T T G G A A C T
257 pos A G A A C T G A G G G C C C T A A A C T A T G C T
338 neg C C C G C A T A T T G C C A G C A T G G C C T T T
578 neg G C G T G G C T T T G G G A A C C C T C G T G G T