Hi all,
I need help with the caret::train function. On my constant messing around with R, I have created a new variable called "age" in the Auto data frame in order to predict whether the car can be classified as "old" or "new" if the year of a given observation is below or above the median for the variable "year". So now I just want to perform LDA using 10-fold CV. I understand from the function trainControl that if classProbs=TRUE, the method will return class probabilities and assigned class, but I can't seem to find what I want. Ideally I want something similar to the argument "CV=TRUE" in the MASS::function, but instead of doing LOOCV, I want to use k-fold CV. Any ideas? Here's the code:
library(ISLR)
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
Auto=Auto
#create vector and add as new column to the Auto data frame
age= rep("new", 392)
median(Auto$year)
#> [1] 76
age[Auto$year < 76]= "old"
Auto=cbind(Auto, age)
set.seed(123)
train_control= trainControl(method = "cv", number = 10, classProbs = TRUE)
#train the model
lda_auto_10cv= train(age ~ mpg + cylinders + displacement + acceleration + weight + horsepower, data= Auto, method= "lda", trControl=train_control)
print(lda_auto_10cv)
#> Linear Discriminant Analysis
#>
#> 392 samples
#> 6 predictor
#> 2 classes: 'new', 'old'
#>
#> No pre-processing
#> Resampling: Cross-Validated (10 fold)
#> Summary of sample sizes: 352, 353, 353, 353, 353, 353, ...
#> Resampling results:
#>
#> Accuracy Kappa
#> 0.7448077 0.4909725
Created on 2019-10-16 by the reprex package (v0.3.0)
Thanks!