predict function

predict.rpart vs. predict.train

  1. Decision Tree using rpart
 pred3 <- predict(tree3, type = "raw")
 
confusionMatrix(pred3, trainingSet$variable))
  1. Random forest using train
pred2 <- predict(rf2, trainingSet)

confusionMatrix(pred2, trainingSet$variable)

Why are the two predict functions different? According to the help function in R, the arguments are the same

## S3 method for class 'train'
predict(object, newdata = NULL, type = "raw",
  na.action = na.omit, ...)

## S3 method for class 'rpart'
predict(object, newdata,
       type = c("vector", "prob", "class", "matrix"),
       na.action = na.pass, ...)

I can try to give you an answer but it seems like your question might be more of "why do these two sets of code give different results?".

I can answer what you've asked but I can't answer that I think you are really asking unless I can reproduce tree3, pred3, rf2, and pred2.

A few reasons:

  1. The are different S3 methods for different types of models

  2. They were written by different people in different packages

There are informal conventions in R for doing things but they are not rigidly enforced (as they are in other languages). When we want to predict new samples, the convention is to use a predict method. However, the arguments and syntax between models and packages are allowed to be different.

what I think you are asking...

I suspect that this is not true.

I'm going to guess that tree3 is a train object that used method = "rpart" since you used the predict code that corresponds to a train object. I can't tell without a reproducible example. I don't know what data were used, what version of R or caret, if this is a classification or regression model, and so on.

One other thing... it's probably a bad idea to repredict the training set. That random forest model should give you close to zero errors (is this a classification model?) no matter how good it is in reality.

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

3 Likes

thank you, this is helpful!

  1. Random Forest
    When I use pred <- predict(rf2), instead of pred <- predict(rf2, trainingSet),
    it gives the error "Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length"

I don't see why I need to specify the reference of trainingSet, when I already specify it in confusionMatrix(pred2, trainingSet$variable)

  1. Decision Tree (classification)
    Why don't I need to specify the reference in predict() for this case?

We need a reproducible example.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.