How to use recipes to impute unknown variables in the test set

alex628 · July 17, 2019, 8:59pm

I have the following reproducible example which represents the common situation where some predictors are useful to train on, but not known when a forecast is generated (i.e. temperature).

The goal is to fill these missing values in with a rolling median imputation based off of the training set.

Is it possible to modify the following example to fill in the NA values in the test set with rolling origin values calculated from the training set?

set.seed(145)
example_data <-
  data.frame(
    day = ymd("2012-06-07") + days(1:12),
    x1 = round(runif(12), 2),
    x2 = round(runif(12), 2),
    x3 = round(runif(12), 2)
  )
d <- initial_time_split(example_data)
trn <- training(d)
tst <- testing(d)
tst$x2 <- NA

library(recipes)
seven_pt <- recipe(~ . , data = trn) %>%
  update_role(day, new_role = "time_index") %>%
  step_rollimpute(x2, window = 7) %>%
  prep(training = trn, retain = TRUE)

juice(seven_pt)
bake(seven_pt, new_data = tst)

Max · July 24, 2019, 1:40pm

How about step_knnimpute() instead?

system · August 14, 2019, 1:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.