tidymodels lasso scaled variables

I'm sure this answer is in documentation somewhere, but I'm not having any luck finding it. For implementing lasso in tidymodels, you need to step_normalize() numeric variables first, even though the glmnet() function uses standardize=TRUE? Thanks!

Hi @lisalendway,

I would use the template recipe from usemodels::use_glmnet:

See examples: https://usemodels.tidymodels.org/reference/templates.html#examples

library(palmerpenguins)
data(penguins)
use_glmnet(species ~ ., data = penguins)
#> glmnet_recipe <- 
#>   recipe(formula = species ~ ., data = penguins) %>% 
#>   step_novel(all_nominal(), -all_outcomes()) %>% 
#>   step_dummy(all_nominal(), -all_outcomes()) %>% 
#>   step_zv(all_predictors()) %>% 
#>   step_normalize(all_predictors(), -all_nominal()) 

The advantage of doing it inside the recipe (which I would guess there is no harm if glmnet takes that step- as the variables will already be normalized so it should not matter?) is that the same step will also be applied when you "bake" the test set (using the mean/sd from the training set), so your predictors will be on the same scale in both the training and testing sets and may be easier to interpret/visualize/etc (see skip = FALSE as the default here: https://recipes.tidymodels.org/reference/step_normalize.html). Added bonus is you won't have to worry about data leakage when using the recipe :slight_smile:

Right! I wasn't thinking about the applying it to new data step. Now that makes total sense. And thanks for the reference - that's exactly what I was looking for.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.