dplyr way(s) and base R way(s) of creating age group from age

I would like to mutate age_group from the variable age. The desired age_group will have four categories: 0–14, 15–44, 45–64, and > 64. What is the most efficient way of generating the variable -- using dplyr and base R?

# Data
toy_df <- data.frame(
  age = c(1 , 2, 3, 17, 45, 54, 57, 68)
)

dplyr::case_when has the benefits of readability.

@nirgrahamuk Thanks for the suggestion. It works nicely. How would you do the same thing using base R?

toy_df %>% 
  mutate(
    # Create categories
    age_group = dplyr::case_when(
      age <= 14            ~ "0-14",
      age > 14 & age <= 44 ~ "15-44",
      age > 44 & age <= 64 ~ "45-64",
      age > 64             ~ "> 64"
    ),
    # Convert to factor
    age_group = factor(
      age_group,
      level = c("0-14", "15-44","45-64", "> 64")
    )
  )
1 Like

I dont think I would as the solutions are messy.
nested ifelse() etc.

This is one of many ways to do it in base R:

toy_df[toy_df$age <= 14, "age_group"] <- "0-14"
toy_df[toy_df$age > 14 & toy_df$age <= 44, "age_group"] <- "15-44"
toy_df[toy_df$age > 44 & toy_df$age <= 64, "age_group"] <- "45-64"
toy_df[toy_df$age > 64, "age_group"] <- "> 64"
1 Like

I feel like cut is a significant better solution.

toy_df["age_group"] = cut(toy_df$age, c(0, 14, 44, 64, Inf), c("0-14", "15-44", "45-64", ">64"), include.lowest=TRUE)
5 Likes

In this case cut() would indeed be better.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.