# dplyr way(s) and base R way(s) of creating age group from age

I would like to mutate `age_group` from the variable `age`. The desired `age_group` will have four categories: 0–14, 15–44, 45–64, and > 64. What is the most efficient way of generating the variable -- using dplyr and base R?

``````# Data
toy_df <- data.frame(
age = c(1 , 2, 3, 17, 45, 54, 57, 68)
)
``````

dplyr::case_when has the benefits of readability.

@nirgrahamuk Thanks for the suggestion. It works nicely. How would you do the same thing using base R?

``````toy_df %>%
mutate(
# Create categories
age_group = dplyr::case_when(
age <= 14            ~ "0-14",
age > 14 & age <= 44 ~ "15-44",
age > 44 & age <= 64 ~ "45-64",
age > 64             ~ "> 64"
),
# Convert to factor
age_group = factor(
age_group,
level = c("0-14", "15-44","45-64", "> 64")
)
)
``````

I dont think I would as the solutions are messy.
nested ifelse() etc.

This is one of many ways to do it in base R:

``````toy_df[toy_df\$age <= 14, "age_group"] <- "0-14"
toy_df[toy_df\$age > 14 & toy_df\$age <= 44, "age_group"] <- "15-44"
toy_df[toy_df\$age > 44 & toy_df\$age <= 64, "age_group"] <- "45-64"
toy_df[toy_df\$age > 64, "age_group"] <- "> 64"
``````
1 Like

I feel like `cut` is a significant better solution.

``````toy_df["age_group"] = cut(toy_df\$age, c(0, 14, 44, 64, Inf), c("0-14", "15-44", "45-64", ">64"), include.lowest=TRUE)
``````
3 Likes

In this case `cut()` would indeed be better.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.