Hi,
I want to tune a ridge regression, where I for the outer sampling method use 10-fold cross validation and for the inner sampling method use one part for training and another part for development. Hence I want to use a training, development and testing framework.
My question are:
- Is it in this scenario correct to use
validation_split
for the inner sampling innested_cv
(see example code below)? - Also, is there a good way to confirm that this methodological flow is actually happening?
For example, is a confirmation that when using validation_split
for the inner split, it says "Validation" (which equal to the Development portion that I'm after?):
<Training/Validation/Total>
<7/2/9>
Whereas, when using, e.g., n-fold for the inner sampling as well, it instead says "Assessment":
<Analysis/Assess/Total>
<8/1/9>
That is, validation means that the data is not used for training; whereas Assessment indicates that the data has or will be used for training (in n-fold cross-validation).
Example data to check splits.
x1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
x1y1 <- tibble(x1, y1)
nested_resampling_dev <- rsample::nested_cv(x1y1,
outside = rsample::vfold_cv(v = 10, repeats = 1),
inside = rsample::validation_split(prop = 3/4))
nested_resampling_dev$inner_resamples[[1]]$splits[[1]]
Comparison data
nested_resampling_2_nfolds <- rsample::nested_cv(x1y1,
outside = rsample::vfold_cv(v = 10, repeats = 1),
inside = rsample::vfold_cv(v = 10, repeats = 1))
nested_resampling_2_nfolds$inner_resamples[[1]]$splits[[1]]
(I have had much help developing it from this nested resampling tutorial: https://www.tidymodels.org/learn/work/nested-resampling/)