LASSO with tidymodels not working


Use the tidymodels-framework to implement LASSO in with nested cross-validation (CV). (Alternatively, I'd also be interested in an implementation with caret but at this point tidymodels is prefered.)

  • Make LASSO work with bootstrap resampling
  • Replace bootstrap resampling with nested CV (inner CV (or bootstrapping) for hyperparameter determination by going through a grid, outer CV to obtain an estimate on the validity of the models)


  1. I tried to use the code by Julia Silge without nested CV but bootstrapping
  2. Add nested CV functionality by replacing resamples with a nested-CV routine.


Using the code below throws me an error probably caused by the fail of LASSO with bootstrapping.


# Package imports ------

# Data ------
# Prepared according to the Blog post by Julia Silge
urlfile = ''
office = read_csv(url(urlfile))[-1]
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   .default = col_double()
#> )
#> See spec(...) for full column specifications.

# Lasso modeling -------
## Recipe and train it 
office_rec <- recipe(imdb_rating ~ ., data = office) %>%
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric(), -all_outcomes()) %>%
  prep(strings_as_factors = FALSE) # Training

## Create workflow 
wf <- workflow() %>%

## Parameter tuning 
### Bootstrapping data for resampling
office_boot <- bootstraps(office, times = 5, strata = season)

### Create lambda seach gird
lambda_grid <- grid_regular(penalty(), levels = 20)

### The model
tune_spec <- linear_reg(penalty = tune(), mixture = 1) %>%

### Apply the workflow
lasso_grid <- tune_grid(
  wf %>% add_model(tune_spec),
  resamples = office_boot,
  grid = lambda_grid
#> ! Bootstrap1: internal: Standardabweichung ist Null
#> ! Bootstrap2: internal: Standardabweichung ist Null
#> ! Bootstrap3: internal: Standardabweichung ist Null
#> ! Bootstrap4: internal: Standardabweichung ist Null
#> ! Bootstrap5: internal: Standardabweichung ist Null
#> Error: `x` and `y` must have same types and lengths

Created on 2020-06-10 by the reprex package (v0.3.0)

# Data ------
# Prepared according to the Blog post by Julia Silge
urlfile = ''
office = read_csv(url(urlfile))[-1]
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   .default = col_double()
#> )
#> See spec(...) for full column specifications.

# Lasso modeling -------
## Recipe and train it 
office_rec <- recipe(imdb_rating ~ ., data = office) %>%
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric(), -all_outcomes()) %>%
  prep(strings_as_factors = FALSE) # Training

## Create workflow 
wf <- workflow() %>%

## Parameter tuning 
### Bootstrapping data for resampling
office_boot <- bootstraps(office, times = 5, strata = season)

### Create lambda seach gird
lambda_grid <- grid_regular(penalty(), levels = 20)

### The model
tune_spec <- linear_reg(penalty = tune(), mixture = 1) %>%

### Apply the workflow
lasso_grid <- tune_grid(
  wf %>% add_model(tune_spec),
  resamples = office_boot,
  grid = lambda_grid
#> ! Bootstrap1: internal: A correlation computation is required, but `estimate` is const...
#> ! Bootstrap2: internal: A correlation computation is required, but `estimate` is const...
#> ! Bootstrap3: internal: A correlation computation is required, but `estimate` is const...
#> ! Bootstrap4: internal: A correlation computation is required, but `estimate` is const...
#> ! Bootstrap5: internal: A correlation computation is required, but `estimate` is const...

Created on 2020-06-10 by the reprex package (v0.3.0)

We don't currently support nested resampling in tune.

I edited the post and appended the versions of the loaded packages. The versions I'm working with should be relatively recent. I think, your dplyr and ggplot2 version are a bit newer.

I can reproduce it now. It looks like you'll have to downgrade to rsample 0.0.6 or upgrade dplyr to 1.0.0.

I upgraded dplyr to 1.0.0 and also the ggplot2 so that I now have the same package versions as you do (Edit 2). The error, x and y must have same types and lengths, is avoided now. Thank you. In Bootstrapping, internal standard deviations are NULL, still. When I use a fixed lamba value, I obtain results but not when using lambda_grid or bootstraps.
Using caret, and a grid defined by myself, LASSO works, too.

I read so, but was hoping to find a work-around or actually find a way to compare the individual lamda coefficients of the variables instead of solely the (RMSE) of lambda the values.

