Hi everyone
I am currently working on a project in which I'd like to use the tidymodels framework and more precisely the {recipes}
package for my data preprocessing.
I need to cover various imputation steps. Among others I would like to use step_lowerimpute()
to impute missing values for a feature with the minimum value in the training data.
However, I'm running into trouble here. While the training of the recipe including this step does work (i.e. the minimum value in the training data is being stored), applying the recipe to any data does not have any effect. Missing values won't be imputed by the recipe.
I don't know, if it's just me who's making some mistake, but I could not figure anythig out, not even with the following very simple reprex:
library(recipes)
library(dplyr)
# data
set.seed(123)
df <- tibble(
a = letters[1:10],
b = rnorm(10),
c = c(rep(1, 3), rep(2, 2), rep(NA, 2), rep(10, 3))
)
# recipe
rec <- recipe(~ ., data = df)
rec_imp <- rec %>%
step_lowerimpute(c)
# trained recipe
rec_imp_trained <- rec_imp %>%
prep()
# you can see that the training has worked
tidy(rec_imp_trained, number = 1)
# but it is not applied to the data
rec_imp_trained %>%
juice()
# also does not work with bake() and new data
set.seed(123)
new_df <- tibble(
a = sample(letters, 3),
b = rnorm(3),
c = c(sample(1:10, 2), NA)
)
rec_imp_trained %>%
bake(new_df)
I'd be very glad to hear some feedback or get some help
Cheers
David