How to get the training error from fit_resamples and hyperparameter tuning?

maelfosso · January 8, 2025, 8:55am

During a cross-validation, fit_resamples return the metric average from the validation set.

lr_model <-
  linear_reg() |>
  set_engine('lm')

lr_wf <-
  workflow() |>
  add_recipe(basic_recipe) |>
  add_model(lr_model)

lr_cv <-
  lr_wf |>
  fit_resamples(
    folds,
    metrics = metric_set(rmse),
    control = control
  )
  
# let' extract result from CV. that will help us to compare it with other models
lr_cv |>
  collect_metrics()
# That's the RMSE validation error
# .metric .estimator  mean     n  std_err .config             
# <chr>   <chr>      <dbl> <int>    <dbl> <chr>               
# rmse    standard   0.161    10 0.000370 Preprocessor1_Model1

The issue I have is how to get the training error.

The same issue occurs after the tuning of hyperparameters.

For example, when tuning the KNN to find the best number of neighbors, the collect_metrics and show_best return the average of the metrics of the validation set from cross-validation, whereas we all know that the best number of neighbors is when while the training errors decreased the validation errors start increasing. Unfortunately, the autoplot function does not show us the training errors only the validation errors.

In this case, for example

tree_grid <-
  grid_regular(
    cost_complexity(),
    tree_depth(),
    min_n(),
    levels = c(3, 5, 10)
  )

tree_wf <-
  workflow() %>%
  add_model(tree_model) %>%
  add_recipe(basic_recipe)

tree_res <- 
  tree_wf %>%
  tune_grid(
    resamples = folds,
    grid = tree_grid,
    metrics = metric_set(rmse),
    control = control
  )

How do we extract the training errors of each couple of hyperparameters/folds?

Max · January 8, 2025, 1:19pm

Since you cross posted to StackOverflow, here is the answer: https://stackoverflow.com/questions/79338394/how-to-get-the-training-error-from-fit-resamples-and-hyperparameter-tuning/79339302#79339302

maelfosso · January 8, 2025, 6:38pm

Thank you, @Max , for your reply, but I have a problem.

You are reusing the whole training dataset instead of using the training part of the data from the current fold.

How do I access the training part of the data of the current fold when running fit_resamples?

Max · January 10, 2025, 8:45pm

I just updated the SO thread for that.

system · April 10, 2025, 8:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.