How to define smoothed models for a GAM using tidymodels and recipe?

Hello,

The goal of my project is to implement multiple ML algorithms, one of them being a gam. In order to make my code as simple as possible (run for different specified models), I want to use tidymodels .

my question is how to define the use of splines in recipes of tidymodels? I want to fit a GAM based on the mgcv package. However to make use of smoothing I need to define the smoothing functions in the formula.

Know as explained in here (Generalized additive models via mgcv — details_gen_additive_mod_mgcv • parsnip) in "Model fitting" the definition of smoothing functions works for fit(), but I would like to preprocess with recipe and then add it to a workflow where I do hyperparametertuning. So defining the formula only in the last fit step is not an option for me.

I Noticed that there are splines available in recipe such as Natural and B splines (B-Spline Basis Functions — step_bs • recipes). However I would like to use tp or p-splines asa are available in mgcv.

Would enjoy any Tipp on how to overcome this issue. Haven't found it online till now.

Here is an example workflow showing what I would like to achieve on an example dataset.

library(tidymodels)

# Define Formula and preprocessing
##### --> This throws the error
full_recipe <- recipe(Species ~ s(Sepal.Length, bs="ps", k=10) + s(Sepal.Width, bs="ps", k=10),
                      data = iris) |>
  step_normalize(all_numeric()) |>
  step_dummy(country)

# Model: hyperparameters, origin library, target variable type
gam_model <- gen_additive_mod() |>
  set_args(select_features = TRUE)|>
  set_args(adjust_deg_free = tune())|>
  set_engine("mgcv") |>
  set_mode("classification")

# Workflow including the model formula, preprocessing and model
gam_workflow <- workflow() |>
  add_recipe(full_recipe) |>
  add_model(gam_model)

# Define Tuning Grid
grid <- grid_regular()

# Hyperparameter Tuning
iris_cv <- vfold_cv(iris, v=5, repeats=1, strata = "Species")
tune_results <- gam_workflow |>
  tune_grid(resamples = iris_cv,
            grid = grid,
            metrics = metric_set(roc_auc))

# Get optimal parameters
param_optim <- tune_results |> select_best(metric = "roc_auc")

# Finalize workflow
workflow <- workflow |>
  finalize_workflow(param_optim)

# fit model
final_model <- fit(workflow, iris)

You would add a supplementary formula to the workflow via add_model(gam_model, formula = <something>).

For your example, you could do something like:


full_recipe <- recipe(Species ~ Sepal.Length + Sepal.Width,
                      data = iris) |>
  step_normalize(all_numeric()) |>
  step_dummy(country)

<snip>

# Workflow including the model formula, preprocessing and model
gam_workflow <- workflow() |>
  add_recipe(full_recipe) |>
  add_model(gam_model, 
            formula = Species ~ s(Sepal.Length, bs="ps", k=10) + s(Sepal.Width, bs="ps", k=10))

There is more about this in the TMwR book too.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.