During a cross-validation, fit_resamples
return the metric average from the validation set.
lr_model <-
linear_reg() |>
set_engine('lm')
lr_wf <-
workflow() |>
add_recipe(basic_recipe) |>
add_model(lr_model)
lr_cv <-
lr_wf |>
fit_resamples(
folds,
metrics = metric_set(rmse),
control = control
)
# let' extract result from CV. that will help us to compare it with other models
lr_cv |>
collect_metrics()
# That's the RMSE validation error
# .metric .estimator mean n std_err .config
# <chr> <chr> <dbl> <int> <dbl> <chr>
# rmse standard 0.161 10 0.000370 Preprocessor1_Model1
The issue I have is how to get the training error.
The same issue occurs after the tuning of hyperparameters.
For example, when tuning the KNN to find the best number of neighbors, the collect_metrics
and show_best
return the average of the metrics of the validation set from cross-validation, whereas we all know that the best number of neighbors is when while the training errors decreased the validation errors start increasing. Unfortunately, the autoplot
function does not show us the training errors only the validation errors.
In this case, for example
tree_grid <-
grid_regular(
cost_complexity(),
tree_depth(),
min_n(),
levels = c(3, 5, 10)
)
tree_wf <-
workflow() %>%
add_model(tree_model) %>%
add_recipe(basic_recipe)
tree_res <-
tree_wf %>%
tune_grid(
resamples = folds,
grid = tree_grid,
metrics = metric_set(rmse),
control = control
)
How do we extract the training errors of each couple of hyperparameters/folds?