I frequently find myself needing to create summary statistics for a dataset with many different grouping variables. I'm trying to figure out a good solution. My current approach works ok, but I'd like to be able to setup a list of variables to pass to map_dfr instead of having to manually write out the bind_rows. The problem is that as it is currently setup, the objects for the list are undefined in the global environment, so I don't know how to create a list of undefined objects to then loop over for the aggregation and summarizing.
I'd also appreciate any more elegant solutions to the general problem. The objective is to support an arbitrary number of group levels with arbitrary summarizing functions. In lay terms, I want to take a dataset, chop it into different groups, and calculate some value for each group. Then repeat that process many times with different grouping variables. The whole thing should then be compiled as a dataframe where I can easily select different groups or different group values to compare the summary statistic.
library(tidyverse)
# create example data, with some arbitrary grouping variables
iris <- iris %>%
mutate(longPetals = Petal.Length > mean(Petal.Length),
widePetals = Petal.Width > mean(Petal.Width))
# Using count as a simple example, could be a lot of different things going on in here by group
fn_group_count <- function(group, data = iris) {
group_c <- deparse(substitute(group))
data %>%
count({{ group }}) %>%
rename(group_value = {{ group }}) %>%
mutate(group = group_c,
group_value = as.character(group_value)) %>%
select(group, everything())
}
# using bind_rows with manual coding of each variable - would like to specify a list of variables instead
group_counts <- bind_rows(
fn_group_count(Species),
fn_group_count(widePetals),
fn_group_count(longPetals)
)
# something like this - but this doesn't work because `Species` is not defined
group_vars <- c(Species, widePetals, longPetals)
map_dfr(group_vars, fn_group_count)