How do you simplify categorization with mutate and rlang?

TeeTrea · November 15, 2020, 6:29pm

I have a survey dataset with an age variable and I would like to categorize it into age brackets. I've created two variables to help with the categorization:

age_brackets <- c("18-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70+" )
age_bracket_list <- list(18:19, 20:24, 25:29, 30:34, 35:39, 40:44, 45:49, 50:54, 55:59, 60:64, 65:69, 70:100)

Now, I'd like to assign my observations to these age groups using as little code as possible while still being human-readable. Right now, this is what I have:

data %>%
  mutate(age_bracket = case_when(
    age %in% age_bracket_list[[1]] ~ age_brackets[1],
    age %in% age_bracket_list[[2]] ~ age_brackets[2],
    age %in% age_bracket_list[[3]] ~ age_brackets[3],
    age %in% age_bracket_list[[4]] ~ age_brackets[4],
    age %in% age_bracket_list[[5]] ~ age_brackets[5],
    age %in% age_bracket_list[[6]] ~ age_brackets[6],
    age %in% age_bracket_list[[7]] ~ age_brackets[7],
    age %in% age_bracket_list[[8]] ~ age_brackets[8],
    age %in% age_bracket_list[[9]] ~ age_brackets[9],
    age %in% age_bracket_list[[10]] ~ age_brackets[10],
    age %in% age_bracket_list[[11]] ~ age_brackets[11],
    age %in% age_bracket_list[[12]] ~ age_brackets[12]
  ))

I know that a combination of rlang verbs and map could get this job done more efficiently but I struggle to combine them in a useful way. I know that since rlang 0.4.0, curly curly could get this done efficiently but again, I'm not sure. Here's a try:

age_brackets_tibble <- tibble(age_brackets, age_bracket_list)
data %>%
    mutate(map(age_bracket = case_when(age %in% {age_brackets_tibble$age_bracket_list} ~ {age_brackets_tibble$age_brackets})))

I'm receiving the error:

Error: Problem with `mutate()` input `..1`.
x argument ".f" is missing, with no default
ℹ Input `..1` is `map(...)`.

I'm not necessarily looking to get rid of the error message (although it would be nice what I did wrong) but what I'm really looking for are some pointers as to how to approach this problem from a metaprogramming standpoint. Thanks!

jmcvw · November 15, 2020, 9:32pm

Hi TeeTrea,

Personally, I wouldn't use either map or case_when at all here (and probably not even rlang).

I would just index directly into the vector of age brackets.
A bonus (at least as far as this example is concerned) is that it does away with the need for the age_bracket_list list of vectors.

Here's is a way that does use rlang (with the %|% operator) to do that. This is concise, efficient and also does away with using {} in the case_when.

age_brackets <- c("18-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70+" )

data <- tibble(age = 18:100)

data %>% 
  mutate(age_bracket = age_brackets[age %/% 5 - 2] %|% "70+")

nirgrahamuk · November 16, 2020, 12:01am

Here is a possible way to do a sort of inner join

age_brackets <- c("18-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70+" )
age_bracket_list <- list(18:19, 20:24, 25:29, 30:34, 35:39, 40:44, 45:49, 50:54, 55:59, 60:64, 65:69, 70:100)

library(tidyverse)


age_brackets_tibble <- tibble(age_brackets,
                              age_bracket_list)
(age <- 1+sample.int(60,15))

map_dfr(age,
        ~bind_cols(age=.,filter(rowwise(age_brackets_tibble),
                . %in% unlist(age_bracket_list))))

system · November 23, 2020, 12:01am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.