Is there any other tool to extract parameter sets after tuning besides select_best()?

I've tuned succesfully several models using {tidymodels} and {workflow_set}. However, when testing the validation dataset with tune::last_fit(), the parameters obtained by tune::select_best don't behave well. This makes me want to manually test other sets of parameters on the validation set. I find tune::show_best() and tune::select_best() very limited for doing so, since they only consider one metric when choosing the right parameters. I've managed to filter the tibbles with a more complex logic involving several metrics using pure {dplyr} but this is not optimal and is time consuming, involving manually finalizing each model every time I want to test one of the models.

Is there a way to cherry pick a set of parameters based on some id (for example tune_bayes iteration number)?

It also would be really helpful that tune::select_best() could take more conditions to pick a model.

This is the classical process to get the "best" set of parameters (which unfortunately is not in my case since I get a model with a very high roc_auc but very bad spec for example).

models_tuned <- models %>% 
  workflow_map("tune_bayes",
               resamples = cv_folds,
               initial = 20,
               iter= 10,
               metrics = mm_metrics,
               verbose = TRUE)

best_results <- models_tuned %>% 
  extract_workflow_set_result(id = "norm_nnet") %>% 
  select_best(metric = "accuracy")

fitted_workflow <- models_tuned %>%
  extract_workflow(id = "norm_nnet") %>%
  finalize_workflow(best_results) %>% 
  last_fit(split=split_df,
           metrics=mm_metrics)

You can pass any parameters you want to finalize_workflow(). The parameters argument takes any tibble that includes values for the tuning parameters.

I do want to say that you are probably going to end up overfitting by taking this approach. It's unclear what is happening; the word "validation" makes sense but the code makes me think that you are going to repeatedly check against the test set. There's no code to suggest how split_df was made.

We do have an experimental package called desirability2 that uses a tool called desirability functions to do multi-metric optimization (also used here). There is an example on the package website.

Thanks for your response Max, probably I wasn't clear, so let me clarify my question.
My dataset is split up manually into training and testing and this is due to the nature of the data.

I've used a workflow set with resampling of the train dataset(using k-fold cv) to tune the parameters of a bunch of models.

When I explore the resulting object with collect_metrics(), I can see how I have models with very good and balanced metrics, and also other models that have a very good metric where having aweful estimates for other metrics, which leads to me choosing a model with select_best with a very high roc_auc but bad sens, or one with a very good sens but bad spec, etc.

That's why I'm interested in manually picking the set of results that I see with a balanced set of metrics (for example a set with roc_auc, sens, and spec >=0.8)

My intention is to then finalize the model and run last_fit to evaluate the performance with the test set.

I hope I'm being more clear this time.

Thanks for your support!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.