Hey,
I have troubles to use the step_nzv in recipes to filter out numeric attributes with small variances but contnuous values. To me it seems, that the step applies only for nominal values, as it calculates the number of unique values and the ratio of most common to second most common. However I have an attribut which is almost everywhere close to zero, never zero. Do I have to bin first (and discretize with same sized bins would change everything)?
In the code below, I there is a minimal example. I expect that both columns low_variance_num and low_variance_nom are filtered out:
library(tidymodels)
data <- tibble(num = seq(1000),rand = runif(1000)) %>%
mutate(low_variance_num = ifelse(num == 1, 1, rand/10000),
low_variance_nom = ifelse(num == 1, 1, 0))
data
var(data$low_variance_num)
var(data$low_variance_nom)
recipe <- recipe(formula = num ~., data = data) %>%
update_role("num", new_role = "label") %>%
step_nzv(all_predictors(), freq_cut = 995/5, unique_cut = 10) %>% # 5min bis hier
prep()
summary(recipe)
Thanks!
P.S: Is there a way to use recipes without providing a formula? In this case the formula is nonsense.