i am using
caret::safs() for some supervised feature selection, and trying to better understand how to set the resampling scheme using
safsControl- both seem to have options to set the resampling method, number and repeats. I've been reading through the docs and what examples I can find, and I'm not totally clear on if I need to set the resampling scheme in both or just one.
caret package book even notes the options are similar between the 2 functions:
Some important options to
indexOut, etc: options similar to those for
traintop control resampling.
my questions boil down to the following:
- if resampling should be set in both, why? and,
- if resampling just needs to be defined in one of them, which one?
Here's a non-working representative example of the code i'm using to conduct safs:
#set resampling scheme in trainControl train_ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3, classProbs = TRUE, summaryFunction = twoClassSummary, savePredictions = "final", allowParallel = FALSE #FALSE here but TRUE below so as to not square number of workers ) caretSA$fitness_extern <- twoClassSummary # also set it in in safsControl - is this needed? safs_ctrl <- safsControl(functions = caretSA, method = "repeatedcv", number = 10, repeats = 3, metric = c(internal = "ROC", external = "ROC"), maximize = c(internal = TRUE, external = TRUE), allowParallel = TRUE, verbose = TRUE) sa_results <- safs(my_recipe, data = training_data, iters = 10, method = "glm", # are both of these needed??? trControl = train_ctrl, safsControl = safs_ctrl)