Hi @DavidJesse,
Good question. I had actually never used step_mutate(), but the way I had it set up before it would learn the minimum value in whatever data it was being applied to, which isn't ideal. So here's how you can get around that by replacing min(c) with !!min(df$c), so that the the minimum is forced to be from df and not from whatever c variable is found in the data mask.
library(recipes)
library(tidyr)
library(dplyr)
set.seed(123)
df <- tibble(
a = letters[1:10],
b = rnorm(10),
c = c(rep(1, 3), rep(2, 2), rep(NA, 2), rep(10, 3))
)
rec <- recipe(~ ., data = df)
rec_imp <-
rec %>%
step_mutate(c = replace_na(c, !!min(df$c, na.rm = TRUE)))
rec_imp_trained <-
rec_imp %>%
prep()
juice(rec_imp_trained)
#> # A tibble: 10 x 3
#> a b c
#> <fct> <dbl> <dbl>
#> 1 a -0.560 1
#> 2 b -0.230 1
#> 3 c 1.56 1
#> 4 d 0.0705 2
#> 5 e 0.129 2
#> 6 f 1.72 1
#> 7 g 0.461 1
#> 8 h -1.27 10
#> 9 i -0.687 10
#> 10 j -0.446 10
df2 <- tibble(
a = letters[1:10],
b = rnorm(10),
c = c(rep(5, 3), rep(2, 2), rep(NA, 2), rep(10, 3))
)
bake(rec_imp_trained, new_data = df2)
#> # A tibble: 10 x 3
#> a b c
#> <fct> <dbl> <dbl>
#> 1 a 0.549 5
#> 2 b 0.238 5
#> 3 c -1.05 5
#> 4 d 1.29 2
#> 5 e 0.826 2
#> 6 f -0.0557 1
#> 7 g -0.784 1
#> 8 h -0.734 10
#> 9 i -0.216 10
#> 10 j -0.335 10