I want to tune a ridge regression, where I for the outer sampling method use 10-fold cross validation and for the inner sampling method use one part for training and another part for development. Hence I want to use a training, development and testing framework.
My question are:
- Is it in this scenario correct to use
validation_splitfor the inner sampling in
nested_cv(see example code below)?
- Also, is there a good way to confirm that this methodological flow is actually happening?
For example, is a confirmation that when using
validation_split for the inner split, it says "Validation" (which equal to the Development portion that I'm after?):
Whereas, when using, e.g., n-fold for the inner sampling as well, it instead says "Assessment":
That is, validation means that the data is not used for training; whereas Assessment indicates that the data has or will be used for training (in n-fold cross-validation).
Example data to check splits.
x1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) y1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) x1y1 <- tibble(x1, y1) nested_resampling_dev <- rsample::nested_cv(x1y1, outside = rsample::vfold_cv(v = 10, repeats = 1), inside = rsample::validation_split(prop = 3/4)) nested_resampling_dev$inner_resamples[]$splits[]
nested_resampling_2_nfolds <- rsample::nested_cv(x1y1, outside = rsample::vfold_cv(v = 10, repeats = 1), inside = rsample::vfold_cv(v = 10, repeats = 1)) nested_resampling_2_nfolds$inner_resamples[]$splits[]
(I have had much help developing it from this nested resampling tutorial: https://www.tidymodels.org/learn/work/nested-resampling/)