Question about dplyr's mutate_at

wikge · April 24, 2019, 1:17pm

My objective was to impute means for a set of variables based on values in another data frame. I wanted to use mutate_at but didn't know how to utilize the column name in the function. I derived the solution below from this answer on Stack Overflow.
This works, but I have no clue why it does. It seems like with pull, . is the actual column name but in the next line . is a vector of numbers. What exactly does mutate_at use as the argument in the function? Thanks.

train <- tibble(income = c(10, 8, 7, 9, 4, 5),
                children = c(3, 5, 2, 7, 9, 10),
                home_value = c(4, 5, 8, 2, 4, 0))
test <- tibble(income = c(4, 5, 2, NA, 8, NA),
               children = c(3, 5, 10, 2, 4, NA),
               home_value = c(3, NA, NA, 4, 1, 5))

mvars <- c("income", "home_value")

test_imp <- test %>%
  mutate_at(mvars,
            list( ~ {
              imp_mean <- train %>% pull(.) %>% mean(na.rm = TRUE)
              if_else(is.na(.), imp_mean, .)
            }))

martin.R · April 24, 2019, 3:53pm

The . refers to a column in each of the three occurrences.

The first one takes the column from train and the next two from test.

AdamSampson · April 24, 2019, 4:33pm

Looking through dplyr:::manip_apply_syms shows that the function basically replaces any . in your function code with the symbolic name of the column. Therefore your code is essentially equivalent to:

test_imp2 <- test %>%
  mutate(
    income = if_else(
      is.na(income),
      train %>% pull(income) %>% mean(na.rm = TRUE),
      income
    ),
    home_value = if_else(
      is.na(home_value),
      train %>% pull(home_value) %>% mean(na.rm = TRUE),
      home_value
    )
  )

The reason that income is treated different in different parts of this code is that you have nested environments. train %>% pull(home_value) %>% mean(na.rm = TRUE) is its own environement. So it checks to see whether home_value is an object within the environment and finds that yes it is a column in train. But in the if_else statement home_value is an object within that environment as a column in test.

system · May 1, 2019, 4:33pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.