Finalizing spline workflow with multiple tuned parameters erroring

See the training data at this GitHub Gist. I'm trying out a very simple model with just a spline interacted by a nominal variable. Using step_bs, I do:

rec_bs <- recipe(stars ~ date + film, dat_train) %>% 
  step_bs(date, deg_free = tune(), degree = tune()) %>% 
  step_dummy(film) %>% 
  step_interact(~ starts_with("date"):starts_with("film"))

Note I'm tuning both the degrees of freedom (deg_free) and the degree of the polynomial (degree).

My model is just vanilla OLS:

lm_mod <- linear_reg() %>% 
  set_engine("lm")

And the workflow:

wf_bs <- workflow() %>% 
  add_model(lm_mod) %>% 
  add_recipe(rec_bs)

I do some crossvalidation to get the best parameters:

grid_bs <- tibble(deg_free = rep(4:10, 3), degree = rep(2:4, each = 7))
folds <- vfold_cv(dat_train, v = 10)
cv_bs <- wf_bs %>% 
  tune_grid(resamples = folds, grid = grid_bs)
best_bs <- select_by_one_std_err(cv_bs, deg_free, degree, metric = "rsq")

So now note that best_bs looks like:

> best_bs
# A tibble: 1 x 9
  deg_free degree .metric .estimator  mean     n std_err .best .bound
     <int>  <int> <chr>   <chr>      <dbl> <int>   <dbl> <dbl>  <dbl>
1        4      2 rsq     standard   0.189    10 0.00729 0.190  0.183

So, I want to finalize the workflow with both deg_free and degree. But I get an error:

> wf_bs %>% 
+   finalize_workflow(best_bs) # erroring here
Error in names(param) <- pset$name : 
  'names' attribute [2] must be the same length as the vector [1]
In addition: Warning message:
In pset$component_id == step_ids :
  longer object length is not a multiple of shorter object length

It looks like it's expecting only one parameter, but I already told it to tune for two. You can see this by calling parameters():

> parameters(wf_bs)
Collection of 2 parameters for tuning

       id parameter type object class
 deg_free       deg_free    nparam[+]
   degree         degree    nparam[+]

What's weird is if you just give it one column, it works fine:

> wf_bs %>% 
+   finalize_workflow(best_bs[1])
══ Workflow ═══════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────
3 Recipe Steps

● step_bs()
● step_dummy()
● step_interact()

── Model ──────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 

> # predict ----------------------------------------------------------------------
> wf_bs %>% 
+   finalize_workflow(best_bs[2])
══ Workflow ═══════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ───────────────────────────────────────────────────────────────────────
3 Recipe Steps

● step_bs()
● step_dummy()
● step_interact()

── Model ──────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 

Any ideas on what's going on here/how I can tell it that I want to supply both deg_free and degree? I get the same if I use a list instead of a tibble:

wf_bs %>% 
  finalize_workflow(list(degree = 2))

That works like above, but again I see an error when I try to supply both:

> wf_bs %>% 
+   finalize_workflow(list(degree = 2, deg_free = 4))
Error in names(param) <- pset$name : 
  'names' attribute [2] must be the same length as the vector [1]
In addition: Warning message:
In pset$component_id == step_ids :
  longer object length is not a multiple of shorter object length

It worked for me. Can you try it via reprex()?

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(readr)
#> 
#> Attaching package: 'readr'
#> The following object is masked from 'package:yardstick':
#> 
#>     spec
#> The following object is masked from 'package:scales':
#> 
#>     col_factor

# https://gist.githubusercontent.com/markhwhiteii/7e5524c0332bf1b55de8d53b4ac499c5/raw/17895e604344ab29bdb478e92a872e1b320ec12d/dat_train.csv
dat_train <- read_csv("dat_train.csv")
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   film = col_character(),
#>   date = col_double(),
#>   stars = col_double()
#> )

rec_bs <- recipe(stars ~ date + film, dat_train) %>% 
  step_bs(date, deg_free = tune(), degree = tune()) %>% 
  step_dummy(film) %>% 
  step_interact(~ starts_with("date"):starts_with("film"))

lm_mod <- linear_reg() %>% 
  set_engine("lm")

wf_bs <- workflow() %>% 
  add_model(lm_mod) %>% 
  add_recipe(rec_bs)

grid_bs <- tibble(deg_free = rep(4:10, 3), degree = rep(2:4, each = 7))
folds <- vfold_cv(dat_train, v = 10)
cv_bs <- wf_bs %>% 
  tune_grid(resamples = folds, grid = grid_bs)
best_bs <- select_by_one_std_err(cv_bs, deg_free, degree, metric = "rsq")

wf_bs %>%  finalize_workflow(best_bs)
#> ══ Workflow ════════════════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 3 Recipe Steps
#> 
#> ● step_bs()
#> ● step_dummy()
#> ● step_interact()
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

Created on 2021-05-04 by the reprex package (v1.0.0.9000)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.