I've tuned succesfully several models using {tidymodels} and {workflow_set}. However, when testing the validation dataset with tune::last_fit(), the parameters obtained by tune::select_best don't behave well. This makes me want to manually test other sets of parameters on the validation set. I find tune::show_best() and tune::select_best() very limited for doing so, since they only consider one metric when choosing the right parameters. I've managed to filter the tibbles with a more complex logic involving several metrics using pure {dplyr} but this is not optimal and is time consuming, involving manually finalizing each model every time I want to test one of the models.
Is there a way to cherry pick a set of parameters based on some id (for example tune_bayes iteration number)?
It also would be really helpful that tune::select_best() could take more conditions to pick a model.
This is the classical process to get the "best" set of parameters (which unfortunately is not in my case since I get a model with a very high roc_auc but very bad spec for example).
You can pass any parameters you want to finalize_workflow(). The parameters argument takes any tibble that includes values for the tuning parameters.
I do want to say that you are probably going to end up overfitting by taking this approach. It's unclear what is happening; the word "validation" makes sense but the code makes me think that you are going to repeatedly check against the test set. There's no code to suggest how split_df was made.
We do have an experimental package called desirability2 that uses a tool called desirability functions to do multi-metric optimization (also used here). There is an example on the package website.
Thanks for your response Max, probably I wasn't clear, so let me clarify my question.
My dataset is split up manually into training and testing and this is due to the nature of the data.
I've used a workflow set with resampling of the train dataset(using k-fold cv) to tune the parameters of a bunch of models.
When I explore the resulting object with collect_metrics(), I can see how I have models with very good and balanced metrics, and also other models that have a very good metric where having aweful estimates for other metrics, which leads to me choosing a model with select_best with a very high roc_auc but bad sens, or one with a very good sens but bad spec, etc.
That's why I'm interested in manually picking the set of results that I see with a balanced set of metrics (for example a set with roc_auc, sens, and spec >=0.8)
My intention is to then finalize the model and run last_fit to evaluate the performance with the test set.