Cross validation with Stratified Sampling

Model 1 Trained in Start with RAW data
n <- 145 ; p <- 8

X=as.matrix(Start)

Y=as.matrix(ClassRoses)
dim(X)
dim(Y)

Xcal <- X[1:100, ] ; ycal <- Y[1:100]
Xval <- X[101:145, ] ; yval <- Y[101:145]

m <- 50 ; p <- 8



nlv <- 20
segm<-segmkf(45, y = NULL, K = 5, type = c("random"), nrep = 1)

pars <- mpars(nlv = 1:nlv)

res <- gridscorelv(Xcal, ycal, Xval, yval, score = err,fun = plsrda, nlv = 1:nlv, verb = TRUE)

Xtrain <- rbind(Xcal, Xval)
ytrain <- c(ycal, yval)

dim(Xtrain)
dim(ytrain)

#res = gridcv(Xcal, ycal, segm, score = rmsep , fun = plsrda, pars, verb = TRUE)

plotscore(res$nlv, res$y1, main = "ERR", xlab = "Nb. LVs", ylab = "Value")

u <- res[res$y1 == min(res$y1), ][1, , drop = FALSE]


# Final model on Xtrain = Xcal+Xval
fm <- plsrda(Xtrain, ytrain, nlv = u$nlv)
Prediction in Day 1
Xtest <- Day1mx
ytest <- as.matrix(ClassRoses)

     
pred1 <- predict(fm, Xtest)$pred
     
err(ytest, pred1)

My questions are:

  1. I have this output when I use GRIDSCORELV function for Cross validation:
    Nb combinations = 0
    End.
    How can I solve this problem and add some combinations to the analysis?

  2. I would like to do Stratified sampling (I am doing a classification PLSDA analysis in order to classify inoculated from non inoculated petals).

Thank you for your help!

Regards,

Mercedes

Thanks for including code in your post - to make it reproducible, please include only objects that are available to others. For example, we can't run your example because we don't have the object Start. That makes it a little hard to help you. If you want to do cross-validation with stratified sampling, you could look at the vfold_cv() function from the rsample package. It's made to be used in the tidymodels ecosystem but you can pull out regular data frames with training() and testing().

library(rsample)

folds <- vfold_cv(mtcars, v = 10, strata = cyl)
folds
#> #  10-fold cross-validation using stratification 
#> # A tibble: 10 × 2
#>    splits         id    
#>    <list>         <chr> 
#>  1 <split [27/5]> Fold01
#>  2 <split [28/4]> Fold02
#>  3 <split [28/4]> Fold03
#>  4 <split [28/4]> Fold04
#>  5 <split [29/3]> Fold05
#>  6 <split [29/3]> Fold06
#>  7 <split [29/3]> Fold07
#>  8 <split [30/2]> Fold08
#>  9 <split [30/2]> Fold09
#> 10 <split [30/2]> Fold10
fold_1_train <- training(folds$splits[[1]]) 
fold_1_test <- testing(folds$splits[[1]]) 
fold_1_test
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Created on 2022-11-28 with reprex v2.0.2

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.