Stacking models with different recipes impossible

I am trying to stack multiple models for a binary classification case. Some models require factors to be turned into dummies while other do not, hence me using different recipes during preprocessing.
Despite the models fitting just fine, I can not seem to stack them due to their differing recipes, as stacks raises an error when trying to use add_candidates() whenever candidates are to be added with different recipes:

Error:
! It seems like the new candidate member 'tune_list$lightgbm$tuning_results' doesn't make use of the same resampling object as the existing candidates.

Is this expected behaviour? I would expect such a stack to be possible. I tried replicating my code as closely as possible with this minimal example below:

library(tidymodels)
library(stacks)
library(bonsai)
library(lightgbm)
#> Loading required package: R6
#> 
#> Attaching package: 'lightgbm'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice

df <- credit_data |> 
  as_tibble() |> 
  tibble::rowid_to_column("id")

tune_wflow <- function(model_data,
                       model_type) {
  
  # train/test split
  train_test_split <- rsample::initial_split(model_data, strata = "Marital")
  train_set <- rsample::training(train_test_split)
  
  # set up recipe
  base_rec <- recipes::recipe(Status ~ ., data = train_set) |>
    recipes::update_role(id, new_role = "id") |>
    recipes::update_role(Marital, new_role = "strata") |>
    # remove NA
    recipes::step_naomit(recipes::all_predictors())
  
  rec <- if (model_type == "xgboost") {
    base_rec |>
      recipes::step_dummy(recipes::all_nominal_predictors()) |>
      # convert remaining non-numeric predictors
      recipes::step_mutate(
        dplyr::across(tidyselect:::where(is.logical), as.double)
      )
  } else if (model_type == "lightgbm") {
    base_rec
  }

  # set up model parameters
  mod <- if (model_type == "xgboost") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("xgboost") |>
      parsnip::set_mode("classification")
    
  } else if (model_type == "lightgbm") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("lightgbm") |>
      parsnip::set_mode("classification")
  }
  
  # bind recipe and model specs to workflow
  wflow <- workflows::workflow() |>
    workflows::add_recipe(rec) |>
    workflows::add_model(mod)
  
  # set up classification metrics
  metrics <- yardstick::metric_set(
    yardstick::accuracy,
    yardstick::roc_auc)
  
  # set up parallelization
  cl <- parallel::makePSOCKcluster(parallel::detectCores())
  doParallel::registerDoParallel(cl)
  
  tune <- tune::tune_grid(
    wflow,
    resamples = rsample::vfold_cv(
      train_set,
      v = 5,
      strata = "Marital"),
    grid = 5, 
    metrics = metrics,
    control = stacks::control_stack_grid()
  )
  closeAllConnections()
  
  # return tuned results
  return(
    list(tuning_results = tune)
  )
}

# fit ----
models <- c("xgboost", "lightgbm")

tune_list <- list()
# fitting loop
for (model in models) {
  tuning_results <- tune_wflow(df, model_type = model)
  tune_list[[model]] <- tuning_results
}

# stacking ----
mystack <- 
  stacks() |>
  add_candidates(tune_list$xgboost$tuning_results) |>
  add_candidates(tune_list$lightgbm$tuning_results)

#> Error:
#> ! It seems like the new candidate member 'tune_list$lightgbm$tuning_results' doesn't make use of the same resampling object as the existing candidates.
#> Run `rlang::last_error()` to see where the error occurred.

Created on 2022-11-21 with reprex v2.0.2

Thanks for the reprex, @stargeysir!

The issue here isn't related to the use of different recipes—that's fair game! As the error notes, the two tuning results you'd like to stack "don't make use of the same resampling object." That is, you call:

rsample::vfold_cv(
      train_set,
      v = 5,
      strata = "Marital")

...for each iteration of the loop, which results in a different set of folds to be used in cross-validation of each model. In ensembling, the set of resamples that candidate members are evaluated on is fixed.

Here's a version of your code that separates the function into two—one that handles data resampling, and one that handles model fitting, so that you can make use of the same resamples in both.

library(tidymodels)
library(stacks)
library(bonsai)
library(lightgbm)

df <- credit_data |> 
  as_tibble() |> 
  tibble::rowid_to_column("id")


resample_data <- function(model_data) {
  # train/test split
  train_test_split <- rsample::initial_split(model_data, strata = "Marital")
  train_set <- rsample::training(train_test_split)
  
  resamples <- rsample::vfold_cv(train_set, v = 5, strata = "Marital")
  
  list(train_set = train_set, resamples = resamples)
}

evaluate_model <- function(splits, model_type) {
  # set up recipe
  base_rec <- recipes::recipe(Status ~ ., data = splits[["train_set"]]) |>
    recipes::update_role(id, new_role = "id") |>
    recipes::update_role(Marital, new_role = "strata") |>
    # remove NA
    recipes::step_naomit(recipes::all_predictors())
  
  rec <- if (model_type == "xgboost") {
    base_rec |>
      recipes::step_dummy(recipes::all_nominal_predictors()) |>
      # convert remaining non-numeric predictors
      recipes::step_mutate(
        dplyr::across(tidyselect:::where(is.logical), as.double)
      )
  } else if (model_type == "lightgbm") {
    base_rec
  }
  
  # set up model parameters
  mod <- if (model_type == "xgboost") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("xgboost") |>
      parsnip::set_mode("classification")
    
  } else if (model_type == "lightgbm") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("lightgbm") |>
      parsnip::set_mode("classification")
  }
  
  # bind recipe and model specs to workflow
  wflow <- workflows::workflow() |>
    workflows::add_recipe(rec) |>
    workflows::add_model(mod)
  
  # set up classification metrics
  metrics <- yardstick::metric_set(
    yardstick::accuracy,
    yardstick::roc_auc)
  
  # set up parallelization
  cl <- parallel::makePSOCKcluster(parallel::detectCores())
  doParallel::registerDoParallel(cl)
  
  tune <- tune::tune_grid(
    wflow,
    resamples = splits[["resamples"]],
    grid = 5, 
    metrics = metrics,
    control = stacks::control_stack_grid()
  )
  closeAllConnections()
  
  # return tuned results
  return(
    list(tuning_results = tune)
  )
}

tune_wflow <- function(model_data,
                       model_types) {
  splits <- resample_data(model_data)
  
  lapply(model_types, evaluate_model, splits = splits)
}

# fit ----
models <- c("xgboost", "lightgbm")

tune_list <- tune_wflow(df, models)

# stacking ----
mystack <- 
  stacks() |>
  add_candidates(tune_list[[1]][[1]]) |>
  add_candidates(tune_list[[2]][[1]])

I'll note that the workflowsets package is designed to accommodate many-models workflows like yours, and is compatible with stacks. These interfaces may be helpful in easing friction when passing around many model objects and avoiding common pitfalls re: data usage. :slight_smile:

1 Like

Ohh that's both incredibly helpful as well as interesting! Thank you for you answer.
I do have a follow-up question: How come my initial approach works as long as the recipe steps are exactly the same? That's where my assumption came from.

I'm glad to hear it!

Using the xgboost recipe for both model specifications, I see the same error as before with your reprex:

library(tidymodels)
library(stacks)
library(bonsai)
library(lightgbm)
#> Loading required package: R6
#> 
#> Attaching package: 'lightgbm'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice

df <- credit_data |> 
  as_tibble() |> 
  tibble::rowid_to_column("id")

tune_wflow <- function(model_data,
                       model_type) {
  
  # train/test split
  train_test_split <- rsample::initial_split(model_data, strata = "Marital")
  train_set <- rsample::training(train_test_split)
  
  # set up recipe
  base_rec <- recipes::recipe(Status ~ ., data = train_set) |>
    recipes::update_role(id, new_role = "id") |>
    recipes::update_role(Marital, new_role = "strata") |>
    # remove NA
    recipes::step_naomit(recipes::all_predictors())
  
  rec <- 
    base_rec |>
      recipes::step_dummy(recipes::all_nominal_predictors()) |>
      # convert remaining non-numeric predictors
      recipes::step_mutate(
        dplyr::across(tidyselect:::where(is.logical), as.double)
      )
  
  # set up model parameters
  mod <- if (model_type == "xgboost") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("xgboost") |>
      parsnip::set_mode("classification")
    
  } else if (model_type == "lightgbm") {
    parsnip::boost_tree(
      min_n = tune::tune(),
      mtry = tune::tune(),
      tree_depth = tune::tune(),
      learn_rate = tune::tune(),
      sample_size = tune::tune()
    ) |>
      parsnip::set_engine("lightgbm") |>
      parsnip::set_mode("classification")
  }
  
  # bind recipe and model specs to workflow
  wflow <- workflows::workflow() |>
    workflows::add_recipe(rec) |>
    workflows::add_model(mod)
  
  # set up classification metrics
  metrics <- yardstick::metric_set(
    yardstick::accuracy,
    yardstick::roc_auc)
  
  tune <- tune::tune_grid(
    wflow,
    resamples = rsample::vfold_cv(
      train_set,
      v = 5,
      strata = "Marital"),
    grid = 5, 
    metrics = metrics,
    control = stacks::control_stack_grid()
  )
  
  # return tuned results
  return(
    list(tuning_results = tune)
  )
}

# fit ----
models <- c("xgboost", "lightgbm")

tune_list <- list()
# fitting loop
for (model in models) {
  tuning_results <- tune_wflow(df, model_type = model)
  tune_list[[model]] <- tuning_results
}

# stacking ----
mystack <- 
  stacks() |>
  add_candidates(tune_list$xgboost$tuning_results, "xgb") |>
  add_candidates(tune_list$lightgbm$tuning_results, "lgb")
#> Warning: The inputted `candidates` argument `xgb` generated notes during
#> tuning/resampling. Model stacking may fail due to these issues; see
#> `?collect_notes` if so.
#> Warning: The inputted `candidates` argument `lgb` generated notes during
#> tuning/resampling. Model stacking may fail due to these issues; see
#> `?collect_notes` if so.
#> Error:
#> ! It seems like the new candidate member 'lgb' doesn't make use of the
#>   same resampling object as the existing candidates.

Created on 2022-11-21 with reprex v2.0.2

If you see different output, could you share a reprex along with your sessioninfo()?

I had this problem with my local script that I cannot share, but your rewrite seems to have fixed that too. I assume that the recipe creation impacted the generation of the resampling object, else I cannot explain this phenomenon.
Either way, thank you for your help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.