I would like to mutate age_group
from the variable age
. The desired age_group
will have four categories: 0–14, 15–44, 45–64, and > 64. What is the most efficient way of generating the variable -- using dplyr and base R?
# Data
toy_df <- data.frame(
age = c(1 , 2, 3, 17, 45, 54, 57, 68)
)
dplyr::case_when has the benefits of readability.
@nirgrahamuk Thanks for the suggestion. It works nicely. How would you do the same thing using base R?
toy_df %>%
mutate(
# Create categories
age_group = dplyr::case_when(
age <= 14 ~ "0-14",
age > 14 & age <= 44 ~ "15-44",
age > 44 & age <= 64 ~ "45-64",
age > 64 ~ "> 64"
),
# Convert to factor
age_group = factor(
age_group,
level = c("0-14", "15-44","45-64", "> 64")
)
)
1 Like
I dont think I would as the solutions are messy.
nested ifelse() etc.
This is one of many ways to do it in base R:
toy_df[toy_df$age <= 14, "age_group"] <- "0-14"
toy_df[toy_df$age > 14 & toy_df$age <= 44, "age_group"] <- "15-44"
toy_df[toy_df$age > 44 & toy_df$age <= 64, "age_group"] <- "45-64"
toy_df[toy_df$age > 64, "age_group"] <- "> 64"
1 Like
I feel like cut
is a significant better solution.
toy_df["age_group"] = cut(toy_df$age, c(0, 14, 44, 64, Inf), c("0-14", "15-44", "45-64", ">64"), include.lowest=TRUE)
5 Likes
In this case cut()
would indeed be better.
system
Closed
December 18, 2020, 3:12pm
8
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.