Using caret for random forests and mlp

rafanadal1986 · May 7, 2021, 3:09am

Hello guys,
I am relatively new to machine learning generally and R specifically.
Is the (caret) package a stronger tool for using random forest rather than the (randomForest) package?
From my understanding, caret helps optimize the hyperparameters of a random forest model, is the (randomForest) package capable of finding the optimum hyperparameters as well? or they can be found after multiple iterations in the generated code from that package? e.g.

randomForest(formula, data=NULL, ..., subset, na.action=na.fail)

If (caret) is a versatile tool for Random forests, can we say the same for other models?

I used caret to generate an mlp model for the same data frame, but the accuracy was poor.

Max · May 7, 2021, 3:17pm

I'm biased but yes it is appropriate for that and other models. As is the tidymodels.

caret, mlr, and tidymodels follow a methodology that is a lot less risky than repeated calling of the same function using different parameters. You could be setting yourself up for overfitting otherwise.

Poor accuracy could be a function of many things (including not having informative predictors). It might help to describe what you are trying to do, the type of data, etc.

rafanadal1986 · May 8, 2021, 9:31pm

Thank you max for your reply,

rafanadal1986 · May 9, 2021, 1:18am

I used caret to train an mlp model with this code.

  library(datasets)
    library(MASS)
    library(caret)
    DP = caret::createDataPartition(Boston$medv, p=0.75, list = F)

    train = Boston[DP,]
    test = Boston[-DP,]
    colnames(train) = colnames(Boston)
    colnames(test) = colnames(Boston)

    mlp = caret::train(medv ~., data = Boston, method = "mlp", trControl = trainControl(method = "cv", number = 3),
                       tuneGrid = expand.grid(size = 1:3), linOut = T, metric = "RMSE")

    Yp = caret::predict.train(mlp, test[,1:13])

I got this error message:

In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.

Can you please help me understand why I got this error?

Thank you Max

Max · May 10, 2021, 4:40pm

That's a warning (not an error). There are cases where the model predicts the same value for all samples. The result is that R² can't be calculated and produce an NA. It's not a warning that should stop you form using the results; the models with that issue are no good anyway.

system · May 31, 2021, 4:41pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.