translate() for glmnet

Hi all --

I have two questions.

  1. In this parsnip documentation, I would have expected the returned model fit template to include lambda = 0.01. Why doesn't it?
lm_spec <- linear_reg(penalty = 0.01)

# `penalty` is tranlsated to `lambda`
translate(lm_spec, engine = "glmnet")
#> Linear Regression Model Specification (regression)
#> 
#> Main Arguments:
#>   penalty = 0.01
#> 
#> Computational engine: glmnet 
#> 
#> Model fit template:
#> glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
#>     family = "gaussian")
  1. The above question was prompted by my attempt to ensure that my preferred lambda was used in the final test data (via last_fit), and it is not clear to me that it is (ie it looks like it is not). Shouldn't the optimal lambda as found by best_penalty be used in the last_fit, and shouldn't that be apparent via the test_res$.workflow model call?
library(glmnet)
library(tidymodels)
data(BinomialExample)
lasso_spec <- logistic_reg(mixture = 1, penalty = tune()) %>%
  set_mode("classification") %>%
  set_engine("glmnet")

binexample <- data.frame(x, factor(y)) %>% rename(y = "factor.y.")
binexample_split <- initial_split(binexample, strata = y)
binexample_train <- training(binexample_split)
binexample_test <- testing(binexample_split)


binexample_recipe <- recipe(y ~ ., data = binexample) 

my_workflow <- workflow() %>%
  add_recipe(binexample_recipe) %>%
  add_model(lasso_spec)

# in glmnet, default nlambda = 100
lasso_penalty_grid <- grid_regular(penalty(), levels = 100)

my_metrics <- metric_set(mn_log_loss, roc_auc)
## do CV
binexample_cv <- vfold_cv(data = binexample_train, v = 10, strata = y)
lasso_tune_res <- tune_grid(
  my_workflow,
  resamples = binexample_cv,
  grid = lasso_penalty_grid,
  metrics = my_metrics
)
# collect_metrics(lasso_tune_res)

best_penalty <- select_by_one_std_err(lasso_tune_res, metric = "mn_log_loss", desc(penalty))

lasso_final <- finalize_workflow(my_workflow, best_penalty)

test_res <- last_fit(object = lasso_final, split = binexample_split, metrics = my_metrics)
test_res$.workflow

> test_res$.workflow
[[1]]
══ Workflow [trained] ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: logistic_reg()

── Preprocessor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "binomial",      alpha = ~1) 

   Df  %Dev   Lambda
1   0  0.00 0.239200
2   1  2.88 0.217900
3   2  6.64 0.198600
4   2  9.99 0.180900
5   2 12.85 0.164900

Thanks for any insight! I'm running tidymodels 0.1.2

tl;dr Despite the value not being in the path, the predictions are made using the value of penalty that you used in your model specification. It actually doesn't need to be in the path; glmnet automatically interpolates the correct values if your lambda is not shown.

Longer answer:

The nature of the glmnet model makes tidymodels do things in a somewhat non-standard way.

We have a new version of parsnip that has a lot of documentation about how we use this model. You can find the current draft of that Rmd file here.

I've knitted it and I suspect it answers your question (but follow-up if it does not)


tidymodels and glmnet

The implementation of the glmnet package has some nice features. For example, one of the main tuning parameters, the regularization penalty, does not need to be specified when fitting the model. The package fits a compendium of values, called the regularization path. These values depend on the data set and the value of alpha, the mixture parameter between a pure ridge model (alpha = 0) and a pure lasso model (alpha = 1). When predicting, any penalty values can be simultaneously predicted, even those that are not exactly on the regularization path. For those, the model approximates between the closest path values to produce a prediction. There is an argument called lambda to the glmnet() function that is used to specify the path.

In the discussion below, linear_reg() is used. The information is true for all parsnip models that have a "glmnet" engine.

Fitting and predicting using parsnip

Recall that tidymodels uses standardized parameter names across models chosen to be low on jargon. The argument penalty is the equivalent of what glmnet calls the lambda value and mixture is the same as their alpha value.

In tidymodels, our predict() methods are defined to make one prediction at a time. For this model, that means predictions are for a single penalty value. For this reason, models that have glmnet engines require the user to always specify a single penalty value when the model is defined. For example, for linear regression:

linear_reg(penalty = 1) %>% set_engine("glmnet")

When the predict() method is called, it automatically uses the penalty that was given when the model was defined. For example:

library(tidymodels)

fit <- 
  linear_reg(penalty = 1) %>% 
  set_engine("glmnet") %>% 
  fit(mpg ~ ., data = mtcars)

# predict at penalty = 1
predict(fit, mtcars[1:3,])
## # A tibble: 3 x 1
##   .pred
##   <dbl>
## 1  22.2
## 2  21.5
## 3  24.9

However, any penalty values can be predicted simultaneously using the multi_predict() method:

# predict at c(0.00, 0.01)
multi_predict(fit, mtcars[1:3,], penalty = c(0.00, 0.01))
## # A tibble: 3 x 1
##   .pred           
##   <list>          
## 1 <tibble [2 × 2]>
## 2 <tibble [2 × 2]>
## 3 <tibble [2 × 2]>
# unnested:
multi_predict(fit, mtcars[1:3,], penalty = c(0.00, 0.01)) %>% 
  add_rowindex() %>% 
  unnest(cols = ".pred")
## # A tibble: 6 x 3
##   penalty .pred  .row
##     <dbl> <dbl> <int>
## 1    0     22.6     1
## 2    0.01  22.5     1
## 3    0     22.1     2
## 4    0.01  22.1     2
## 5    0     26.3     3
## 6    0.01  26.3     3

Where did lambda go?

It may appear odd that the lambda value does not get used in the fit:

linear_reg(penalty = 1) %>% 
  set_engine("glmnet") %>% 
  translate()
## Linear Regression Model Specification (regression)
## 
## Main Arguments:
##   penalty = 1
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     family = "gaussian")

Internally, the value of penalty = 1 is saved in the parsnip object and no value is set for lambda. This enables the full path to be fit by glmnet(). See the section below about setting the path.

How do I set the regularization path?

Regardless of what value you use for penalty, the full coefficient path is used when [glmnet::glmnet()] is called.

What if you want to manually set this path? Normally, you would pass a vector to lambda in [glmnet::glmnet()].

parsnip models that use a glmnet engine can use a special optional argument called path_values. This is not an argument to [glmnet::glmnet()]; it is used by parsnip to independently set the path.

For example, we have found that if you want a fully ridge regression model (i.e., mixture = 0), you can get the wrong coefficients if the path does not contain zero (see issue #431).

If we want to use our own path, the argument is passed as an engine-specific option:

coef_path_values <- c(0, 10^seq(-5, 1, length.out = 7))

fit_ridge <- 
  linear_reg(penalty = 1, mixture = 0) %>% 
  set_engine("glmnet", path_values = coef_path_values) %>% 
  fit(mpg ~ ., data = mtcars)

all.equal(sort(fit_ridge$fit$lambda), coef_path_values)
## [1] TRUE
# predict at penalty = 1
predict(fit_ridge, mtcars[1:3,])
## # A tibble: 3 x 1
##   .pred
##   <dbl>
## 1  22.1
## 2  21.8
## 3  26.6

Tidying the model object

[broom::tidy()] is a function that gives a summary of the object as a tibble.

tl;dr tidy() on a glmnet model produced by parsnip gives the coefficients for the value given by penalty.

When parsnip makes a model, it gives it an extra class. Use the tidy() method on the object, it produces coefficients for the penalty that was originally requested:

tidy(fit)
## # A tibble: 11 x 3
##    term        estimate penalty
##    <chr>          <dbl>   <dbl>
##  1 (Intercept)  35.3          1
##  2 cyl          -0.872        1
##  3 disp          0            1
##  4 hp           -0.0101       1
##  5 drat          0            1
##  6 wt           -2.59         1
##  7 qsec          0            1
##  8 vs            0            1
##  9 am            0            1
## 10 gear          0            1
## 11 carb          0            1

Note that there is a tidy() method for glmnet objects in the broom package. If this is used directly on the underlying glmnet object, it returns all of coefficients on the path:

# Use the basic tidy() method for glmnet
all_tidy_coefs <- broom:::tidy.glmnet(fit$fit)
all_tidy_coefs
## # A tibble: 640 x 5
##    term         step estimate lambda dev.ratio
##    <chr>       <dbl>    <dbl>  <dbl>     <dbl>
##  1 (Intercept)     1     20.1   5.15     0    
##  2 (Intercept)     2     21.6   4.69     0.129
##  3 (Intercept)     3     23.2   4.27     0.248
##  4 (Intercept)     4     24.7   3.89     0.347
##  5 (Intercept)     5     26.0   3.55     0.429
##  6 (Intercept)     6     27.2   3.23     0.497
##  7 (Intercept)     7     28.4   2.95     0.554
##  8 (Intercept)     8     29.4   2.68     0.601
##  9 (Intercept)     9     30.3   2.45     0.640
## 10 (Intercept)    10     31.1   2.23     0.673
## # … with 630 more rows
length(unique(all_tidy_coefs$lambda))
## [1] 79

This can be nice for plots but it might not contain the penalty value that you are interested in.

@Max , thank you for the detailed explanation! Thanks to you and Julia and Hannah for great documentation. It makes sense, and I think I just have one remaining question:

As I understand it, the last_fit object returns (via .metrics[[1]]) the performance as estimated on the testing data using the lambda that I chose.

best_penalty <- select_by_one_std_err(lasso_tune_res, metric = "mn_log_loss", desc(penalty))

lasso_final <- finalize_workflow(my_workflow, best_penalty)

test_res <- last_fit(object = lasso_final, split = binexample_split, metrics = my_metrics)

but is there a way for me to confirm that via the test_res object itself? str(test_res) returns so much information that I expect the answer is yes but I haven't figured it out yet myself.

I'm just curious if, for saving model objects, I can save only test_res and yet still recover best_penalty as the lambda value used in the test set evaluation.

Thank you!!

Not from the underlying glmnet object. We save the lambda value of your choice in another location:

test_res $.workflow[[1]]$fit$fit$spec$args$penalty

(but we really advise not to use that path or value manually)

Perfect, thanks @Max! I won't plan on using that but I do appreciating knowing where the information is. test_res$.workflow[[1]]$fit$fit$spec is very helpful and I hadn't yet found my way there.

Thanks again!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.