Dynamically add steps to a Recipe; manage to evaluate expressions within a loop

Hi!
I would like a variable within a loop to be evaluated straight away. That is, I would like to create a recipe in tidymodels that gets PCAs added to it in a loop; however, the problem seems to be that it doesn't evaluate within the loop. (I'm open for other solutions if there is a better one than a loop here.)

My code so far:

#Example data
set.seed(42)
df_x1 <- tibble(V_set1.V1=runif(10, 1, 10), V_set1.V2=runif(10, 1, 10))
df_x2 <- tibble(V_set2.V3=runif(10, 1, 10), V_set2.V4=runif(10, 1, 10))
y <- runif(10, 1, 10)
My_data <- cbind(df_x1, df_x2, y)

# Example recipe
My_recipe <-
  recipe(y ~ .,
              data = My_data) %>%
  step_scale(all_predictors())
My_recipe

# Variables starting with these strings will be selected in the for loop
variable_index_vec <- c("V_set1", "V_set2")

# Loop that should add a PCA in each loop; first selecting all variables starting with V_set1; and then V_set2.
for (i in 1:length(variable_index_vec)){

  variable_index <- variable_index_vec[i]

  prefix_index <- paste("group_", i, sep = "")

  My_recipe <- step_pca(My_recipe, starts_with(eval(parse(text=variable_index_vec[i]))), threshold = .95, prefix = prefix_index)
}
# However, here it does not say Vset1 so it does not work later! 
My_recipe[3]$steps[[2]][1]
My_recipe[3]$steps[[3]][1]


# FYI, adding them like below works fine: 
My_recipe <- step_pca(My_recipe, starts_with(V_set1), threshold = .95, prefix = prefix_index)
My_recipe <- step_pca(My_recipe, starts_with(V_set2), threshold = .95, prefix = prefix_index)
My_recipe[3]$steps[[4]][1]
My_recipe[3]$steps[[5]][1]

Thanks in advance!

You can add them as new steps to the recipe. Here's a teeny amount of rlang metaprogramming:

library(tidymodels)
#> ── Attaching packages ───────────────────────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom     0.5.4          ✓ recipes   0.1.10    
#> ✓ dials     0.0.4          ✓ rsample   0.0.5.9000
#> ✓ dplyr     0.8.5          ✓ tibble    2.1.3     
#> ✓ ggplot2   3.3.0          ✓ tune      0.0.1.9000
#> ✓ infer     0.5.1          ✓ workflows 0.1.0     
#> ✓ parsnip   0.0.5.9000     ✓ yardstick 0.0.5     
#> ✓ purrr     0.3.3
#> ── Conflicts ──────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard()  masks scales::discard()
#> x dplyr::filter()   masks stats::filter()
#> x dplyr::lag()      masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step()   masks stats::step()

#Example data
set.seed(42)
df_x1 <-
  tibble(V_set1.V1 = runif(10, 1, 10),
         V_set1.V2 = runif(10, 1, 10))
df_x2 <-
  tibble(V_set2.V3 = runif(10, 1, 10),
         V_set2.V4 = runif(10, 1, 10))
y <- runif(10, 1, 10)
My_data <- cbind(df_x1, df_x2, y)

# Example recipe
My_recipe <-
  recipe(y ~ ., data = My_data) %>%
  step_scale(all_predictors())
My_recipe
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4
#> 
#> Operations:
#> 
#> Scaling for all_predictors

# Variables starting with these strings will be selected in the for loop
variable_index_vec <- c("V_set1", "V_set2")

# Loop that should add a PCA in each loop; first selecting all variables starting with V_set1; and then V_set2.
for (i in variable_index_vec) {
  My_recipe <- 
    My_recipe %>% 
    # !! splices the current name into the `matches()` function.
    # We use a custom prefix so there are not name collisions for the 
    # results of each PCA step. 
    step_pca(matches(!!i), prefix = paste("PCA_", i, "_"))
}

My_recipe %>% prep()
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          4
#> 
#> Training data contained 10 data points and no missing data.
#> 
#> Operations:
#> 
#> Scaling for V_set1.V1, V_set1.V2, V_set2.V3, V_set2.V4 [trained]
#> PCA extraction with V_set1.V1, V_set1.V2 [trained]
#> PCA extraction with V_set2.V3, V_set2.V4 [trained]

Created on 2020-03-26 by the reprex package (v0.3.0)

2 Likes

Really Cool! Thank you ever so much for the help!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.