How to get summaries at different levels

Hi all,

I'm creating a bunch of basic status reports and one of things I'm finding tedious is adding a total row to all my tables. I'm currently using the Tidyverse approach and this is an example of my current code. What I'm looking for is an option to have a few different levels included by default.

#load into RStudio viewer (not required)
iris = iris

#summary at the group level
summary_grouped = iris %>% 
       group_by(Species) %>%
       summarize(mean_s_length = mean(Sepal.Length),
                 max_s_width = max(Sepal.Width))

#summary at the overall level
summary_overall = iris %>% 
  summarize(mean_s_length = mean(Sepal.Length),
            max_s_width = max(Sepal.Width)) %>%
  mutate(Species = "Overall")
 
#append results for report       
summary_table = rbind(summary_grouped, summary_overall)

Doing this multiple times over is very tedious.

I kind of want:

summary_overall = iris %>% 
       group_by(Species, total = TRUE) %>%
       summarize(mean_s_length = mean(Sepal.Length),
                 max_s_width = max(Sepal.Width))

FYI - if you're familiar with SAS I'm looking for the same type of functionality available via a class, ways or types statements in proc means that let me control the level of summarization and get multiple levels in one call.

Any help is appreciated. I know I can create my own function, but was hoping there is something that already exists.

I'm 90% certain there is something like this in janitor package, but I've written my own function:

library(tidyverse)
create_summaries <- function(.data, ...){
  .data <- .data %>%
    dplyr::mutate(overall = "overall")
  
  dots <- rlang::quos(overall, !!!rlang::enquos(...))
  
  grouping_vars <- purrr::accumulate(dots, c)
  
  purrr::map_dfr(grouping_vars, function(vars){
    name <- paste0(purrr::flatten(purrr::map(vars, rlang::as_label)), collapse = "_")
    
    .data %>%
      dplyr::group_by(!!!vars) %>%
      dplyr::summarise(mean_s_length = mean(Sepal.Length),
                       max_s_width = max(Sepal.Width)) %>%
      dplyr::mutate(summarization_level = name)
  }) %>%
    dplyr::select(summarization_level, max_s_width, mean_s_length)
}

create_summaries(iris, Species, Petal.Length) 
#> Warning: Unquoting language objects with `!!!` is deprecated as of rlang 0.4.0.
#> Please use `!!` instead.
#> 
#>   # Bad:
#>   dplyr::select(data, !!!enquo(x))
#> 
#>   # Good:
#>   dplyr::select(data, !!enquo(x))    # Unquote single quosure
#>   dplyr::select(data, !!!enquos(x))  # Splice list of quosures
#> 
#> This warning is displayed once per session.
#> # A tibble: 52 x 3
#>    summarization_level          max_s_width mean_s_length
#>    <chr>                              <dbl>         <dbl>
#>  1 ~_overall                            4.4          5.84
#>  2 overall_Species                      4.4          5.01
#>  3 overall_Species                      3.4          5.94
#>  4 overall_Species                      3.8          6.59
#>  5 overall_Species_Petal.Length         3.6          4.6 
#>  6 overall_Species_Petal.Length         3            4.3 
#>  7 overall_Species_Petal.Length         4            5.4 
#>  8 overall_Species_Petal.Length         3.9          4.84
#>  9 overall_Species_Petal.Length         4.2          4.92
#> 10 overall_Species_Petal.Length         4.4          5.15
#> # … with 42 more rows

Created on 2019-06-21 by the reprex package (v0.3.0)

You pass it all the variables that you want it to group by and then it creates summaries at combination of each level. You can definitely improve it by creating a way to pass in summary functions, but I thought it might be a good start for you. I might come back to it tomorrow :slight_smile:

2 Likes

FYI, cross-posting is fine questions on this site, we just ask that you link to the question on other sites. :slightly_smiling_face: See the FAQ on cross-posting.

Here's the link to the same question on Stack Overflow: dataframe - Summarize data at different aggregate levels - R and tidyverse - Stack Overflow.

1 Like

I think that first link is incorrect? Thanks for the note though!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.