Hi, I'm wondering what the suggested course of action would be when a recipe that uses
step_percentile encounters a new data value outside the range on which it was prepped.
library(dplyr) library(recipes) train_df <- tibble( a = 1:10, b = 10:1 ) rec <- train_df %>% recipe(a ~ b) %>% step_percentile( b, options = list( probs = seq(0, 1, by = 1/4) ) ) %>% prep() new_df <- tibble(a = c(1, 4, 5), b = c(0.99, 5, 10.01)) bake(rec, new_data = new_df) #> # A tibble: 3 x 2 #> b a #> <dbl> <dbl> #> 1 NA 1 #> 2 0.444 4 #> 3 NA 5
I understand why it is returning
NA, but I could see it being desirable to have values outside the range of the training data be set to the highest/lowest quantile value. Since that isn't an option, would it simply be best create a recipe step to cap the data to a pre-determined range?