I'm sure this answer is in documentation somewhere, but I'm not having any luck finding it. For implementing lasso in tidymodels, you need to step_normalize() numeric variables first, even though the glmnet() function uses standardize=TRUE? Thanks!
I would use the template recipe from usemodels::use_glmnet:
library(palmerpenguins) data(penguins) use_glmnet(species ~ ., data = penguins) #> glmnet_recipe <- #> recipe(formula = species ~ ., data = penguins) %>% #> step_novel(all_nominal(), -all_outcomes()) %>% #> step_dummy(all_nominal(), -all_outcomes()) %>% #> step_zv(all_predictors()) %>% #> step_normalize(all_predictors(), -all_nominal())
The advantage of doing it inside the recipe (which I would guess there is no harm if glmnet takes that step- as the variables will already be normalized so it should not matter?) is that the same step will also be applied when you "bake" the test set (using the mean/sd from the training set), so your predictors will be on the same scale in both the training and testing sets and may be easier to interpret/visualize/etc (see
skip = FALSE as the default here: https://recipes.tidymodels.org/reference/step_normalize.html). Added bonus is you won't have to worry about data leakage when using the recipe
Right! I wasn't thinking about the applying it to new data step. Now that makes total sense. And thanks for the reference - that's exactly what I was looking for.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.