understanding tune(), update_model(), and parameters()

I'm following the excellent tidymodels workshop materials on tuning by @apreshill and @garrett (from slide 40 in the tune deck). I think I'm missing something about how tuning works. In the example I modified below, I stick tune() placeholders in the recipe and model specifications and then build the workflow. When I run tune_grid() I get the following error:

Error: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use parameters() to finalize the parameter ranges.

Hill and Grolemund don't use parameters() in their example. They do something a bit different:

  1. Define recipe/model/workflow without tune()
  2. Define a new model with tune() and update the workflow with update_model()
  3. Fit with tune_grid()

Here's what I tried:

library(modeldata)
data(stackoverflow)
set.seed(100) # Important!

so_split <- initial_split(stackoverflow, strata = Remote)
so_train <- training(so_split)
so_test  <- testing(so_split)
so_folds <- vfold_cv(so_train, v = 10, strata = Remote)

so_rec <- recipe(Remote ~ ., 
                 data = so_train) %>% 
  step_dummy(all_nominal(), -all_outcomes()) %>% 
  step_lincomb(all_predictors()) %>% 
  step_downsample(Remote, under_ratio = tune())

rf_spec <- 
  rand_forest(mtry = tune(),
              min_n = tune()) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_wf <-
  workflow() %>% 
  add_recipe(so_rec) %>% 
  add_model(rf_spec)

tuneParam <- expand_grid(under_ratio = c(1, 1.1, 1.2))

rf_results <-
  rf_wf %>% 
  tune_grid(resamples = so_folds,
            grid=tuneParam)

This generates the error referenced.

EDIT 1: It seems like the error gets thrown by tune() in recipes. When I run the above without the unknown in recipes (and without defining grid=tuneParam in tune_grid(), it works. So I'm wondering how parameters() fits in when you want to tune a parameter in recipes.

Hi there,

I'm getting the same error when I try to tune both the recipe and the model. Perhaps a github issue with full reprex? Here is mine (sorry for not rendering, I'm getting an error trying to run the reprex locally :sob:)

library(modeldata)
data("stackoverflow")
library(tidyverse)
library(tidymodels)
set.seed(100) # Important!

# make smaller to save time
so_split <- initial_split(sample_n(stackoverflow, size = 300), strata = Remote)
so_train <- training(so_split)
so_test  <- testing(so_split)
# again, simpler so runs faster
so_folds <- vfold_cv(so_train, v = 2, strata = Remote)

so_rec <- recipe(Remote ~ ., 
                   data = so_train) %>% 
  step_dummy(all_nominal(), -all_outcomes()) %>% 
  step_lincomb(all_predictors()) %>% 
  step_downsample(Remote)

tune_rec <- recipe(Remote ~ ., 
                 data = so_train) %>% 
  step_dummy(all_nominal(), -all_outcomes()) %>% 
  step_lincomb(all_predictors()) %>% 
  step_downsample(Remote, under_ratio = tune())

# no tuning
rf_spec <- 
  rand_forest() %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

# add tuning
tune_spec <-
  rand_forest(mtry = tune(),
              min_n = tune()) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

# define 3 workflows
tuneboth_wf <-
  workflow() %>% 
  add_recipe(tune_rec) %>% 
  add_model(tune_spec)

tunerec_wf <-
  workflow() %>% 
  add_recipe(tune_rec) %>% 
  add_model(rf_spec)

tunemod_wf <-
  workflow() %>% 
  add_recipe(so_rec) %>% 
  add_model(tune_spec)

# this won't work
tuneboth_wf %>% 
  tune_grid(resamples = so_folds)

# works
tunerec_wf %>% 
  tune_grid(resamples = so_folds)

# works
tunemod_wf %>% 
  tune_grid(resamples = so_folds)
1 Like

Thanks @apreshill. Filed an issue on github.

Answered on GH. The issue is that mtry has no pre-defined upper bound since it depends on the dimensions of the data. There are informative error messages and documentation on the matter and how to avoid it.

From the issue:

tunemod_wf doesn't fail since it does not have tuning parameters in the recipe. In that case it knows the dimensions of the data (since the recipe can be prepared) and run finalize() without any ambiguity.

If there are tuning parameters, the recipe cannot be prepared beforehand and the parameters cannot be finalized.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.