Is there a function that returns the list/vector/etc. of variables that "survive" the step functions? I have a simple reproducible example below. I'd like to be able to get a list of variables that are left at the end - in this case, only x2
. This seems somewhat related to this post.
library(tidymodels)
set.seed(123)
samp_size <- 1000
# Creating sample data where x2 is highly correlated with x1 and x3 has near-zero variance
sample_data <- tibble(x = rnorm(samp_size)) %>%
mutate(
y = 3 + 2*x + rnorm(samp_size, 0, .5),
x2 = x + rnorm(samp_size, 0,.1),
x3 = c(rep(0, samp_size - 1), 1)
)
# Didn't break out training and testing since it's not needed for this simple example.
simple_recipe <- recipe(y ~ ., sample_data) %>%
step_nzv(all_numeric_predictors()) %>%
step_corr(all_numeric_predictors())
# When I run the steps, I'm only left with y and x2
simple_recipe %>%
prep() %>%
juice()