Influence of Metric Set on Workflow Tuning

When tuning workflows (recipe/model combinations) in a workflow set, how do the specified metric functions influence the tuning?

More detail:

  • We can create a few different recipes and store them in a list ("recipes_list")
  • We can specify a few different models and store them in a list ("model_specs")
  • We can specify a list of metric functions (a metric set) to be used for tuning; for example:
metrics_tuning <- metric_set(yardstick::f_meas, yardstick::sensitivity, yardstick::specificity, 
                             yardstick::pr_auc, yardstick::roc_auc, yardstick::accuracy, 
                             yardstick::precision, yardstick::average_precision)
  • We can combine recipes and models into workflows and store them in a workflowset:
# configure the tuning
cv_folds <- vfold_cv(data_train, v = 8)   # number of cross-validation folds
tune_grid_size <- 50                      # number of hyperparameter combinations to try

# create a workflow_set
wf_set <- workflow_set(
  preproc = recipes_list,     # add multiple recipes, as appropriate
  models = model_specs,       # add multiple models, as appropriate
  cross = TRUE                # execute all combinations of recipes and models
)
  • Then we can tune the workflows in the workflowset (using different approaches):
# tune the models and update the workflow_set with ALL tuned results
  wf_set_tuned_results <- workflow_map(
    wf_set,
    fn = "tune_race_anova",   # repeated measures ANOVA; a more efficient search
    verbose = TRUE,
    seed =  123
    resamples = cv_folds,
    grid = tune_grid_size,
    metrics = metrics_tuning,
    control = control_race(verbose = TRUE, allow_par = TRUE, parallel_over = "everything", save_pred = TRUE, save_workflow = TRUE)
  )

Given that context, what I'm trying to understand is how the list of metric functions influence the tuning. Does the tuning try to optimize all metrics at once? Does it try to optimize all metrics, but apply more weight to those specified earlier in the list? Does it only use the first metric to guide the tuning (but including others in the list allows you to see their calculated values in the results)?

I thought that we were more explicit about this in the manual pages, but they are not.

For racing or any other directed optimization (like Bayesian optimization or simulated annealing), the first metric in the metric set is used to guide the process. If you use any verboseness, you should see something like:

Racing will maximize the roc_auc metric.

Then man pages for tune_bayes() and tune_sim_anneal() say that:

The first metric in metrics is the one that will be optimized.

I'll add an issue to update the racing pages.

In any case, it does return all of the metrics that you declare, but only one is used for optimization.

1 Like

Perfect. Thank you for the quick reply, Max!

(Adding details for others with similar questions in the future.)

After tuning the workflowset with tune_race_anova, I created a loop that iterates over each tuned workflow and calls fit_best() to fit the underlying model based on a specified metric (metric_fit). (Note: metric_fit must be listed in the original set of metrics (metrics_tuning) used for tuning.)

    workflow_fitted <- wf_set_tuned_results %>%
      extract_workflow_set_result(id = wflow_ID) %>%
      fit_best(metric = metric_fit, verbose = TRUE)

Changing metric_fit from one metric function (e.g., accuracy) to another (e.g., sensitivity) does not change the results of predicting on new data—just as Max stated here.

HOWEVER, after tuning the workflowset with a regular tune_grid and then fitting each workflow model with different values of metric_fit, I DO get different prediction results.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.