My objective was to impute means for a set of variables based on values in another data frame. I wanted to use mutate_at
but didn't know how to utilize the column name in the function. I derived the solution below from this answer on Stack Overflow.
This works, but I have no clue why it does. It seems like with pull
, .
is the actual column name but in the next line .
is a vector of numbers. What exactly does mutate_at
use as the argument in the function? Thanks.
train <- tibble(income = c(10, 8, 7, 9, 4, 5),
children = c(3, 5, 2, 7, 9, 10),
home_value = c(4, 5, 8, 2, 4, 0))
test <- tibble(income = c(4, 5, 2, NA, 8, NA),
children = c(3, 5, 10, 2, 4, NA),
home_value = c(3, NA, NA, 4, 1, 5))
mvars <- c("income", "home_value")
test_imp <- test %>%
mutate_at(mvars,
list( ~ {
imp_mean <- train %>% pull(.) %>% mean(na.rm = TRUE)
if_else(is.na(.), imp_mean, .)
}))