Looking for examples displaying data with multi-level totals


#1

[edited for clarity]

I’m looking for examples or ideas on how I might display a data table in a .Rmd document that has grouped totals at multiple levels. So, in the example following, the Net Sales, Units, and AUR have a total at the district level and company level.

Following I just crafted an example to illustrate what I mean:

The screenshot illustrates what I might display to the user and below was my R attempt to get a similar result minus the formating.

library(tidyverse)
library(xtable)

# Create dummy data
salesData <- data.frame(District=c(1,1,1,2,2,2), 
                        Name=c("S1","S2","S3","S4","S5","S6"),
                        NetSales=c(1000,1500,3000,2000,1500,3000),
                        Units=c(50,120,52,12,64,52)) %>%
                        mutate(AUR = NetSales / Units)
                      
# Generates District Totals
salesTotals <- salesData %>% 
  group_by(District) %>% 
  summarize(Name = "Total", 
            NetSales = sum(NetSales),
            Units = sum(Units), 
            AUR = sum(NetSales) / sum(Units) )

# Generates Grand Total
grandTotals <- salesTotals %>%
  summarize(
            District = "", 
            Name = "GrandTotal", 
             NetSales = sum(NetSales),
             Units = sum(Units), 
             AUR = round((sum(NetSales)/sum(Units),2)))

# Binds district totals to frame
totals <- rbind(salesData, salesTotals) %>%
              arrange(District, Name)

#Binds grand total to frame
totals <- rbind(totals, grandTotals) 

#Generate LaTeX table to display 
xtable(totals)

Which generates something like this using xtable

This creates a rather rough table and with some more table formatting in xtable, the table would look presentable.

  • Is there a tidy verse way of doing this?
  • How you might go about displaying this data to your users?
  • Have you had to deal with this sort of display before in other projects?

#2

I’d suspect that there isn’t really a tidyverse way to do this because your resulting data frame isn’t very ‘tidyverse’.

As a report I think your summary table can be great, simply because your audience understands and it ma hold exactly the information that they need. However as a table it isn’t very tidy. Specifically the ‘Name’ column, it is a variable indicating a store and ‘total’ and ‘grand total’ are not stores. Since the tidyvese functions generally take a tidy data frame and return a tidy data frame I doubt that there is any mainstream functions that do this.

If you need this a lot you could write your own function and pipe your data through it.

salesData <- data.frame(District=as.character(c(1,1,1,2,2,2)), 
                        Name=c("S1","S2","S3","S4","S5","S6"),
                        NetSales=c(1000,1500,3000,2000,1500,3000),
                        Units=c(50,120,52,12,64,52)) %>%
  mutate(AUR = NetSales / Units)

report_salesdata <- function(salesdata) {
  salesdata %>% 
    bind_rows(salesData %>% 
                group_by(District) %>% 
                summarize(Name = "Total", 
                          NetSales = sum(NetSales),
                          Units = sum(Units), 
                          AUR = sum(NetSales) / sum(Units) )
    ) %>% 
    arrange(District, Name) %>% 
    bind_rows(salesData %>% 
                summarize(
                  District = "", 
                  Name = "GrandTotal", 
                  NetSales = sum(NetSales),
                  Units = sum(Units), 
                  AUR = round(sum(NetSales)/sum(Units),2))
    )
}

salesData %>% 
  report_salesdata() %>% 
  xtable()

Best,
Jiddu


#3

Thanks @JidduAlexander!

My example is definitely not a very tidy thing to do.

I was hoping I’ve missed a package that would be something like ggplot that produces a customizable table output with multi-level grouping as a feature. I’d imagine (just brainstorming a little) creating an S3 object that includes the data.frame, a summary of the group(s), a grand total, and a table object that includes an HTML or LaTeX output that can be accessed and builds the output.

I’d imagine the syntax would look something like the following I’ve penciled out.

 <- salesdata %>% 
     ggtable() +
     summarize_by_group(District) +
     summarize_total() + 
     generate_table()

#4

It would probably help everyone involved if there was a hard-and-fast idea of how the table would look.

Still, here’s an example using arrays and the addmargins() function:

library(dplyr)
library(tidyr)

sales_array <- array(
  c(salesData[["NetSales"]], salesData[["Units"]], salesData[["AUR"]]),
  dim = c(
    Store    = 3,
    District = 2,
    Measure  = 3
  ),
  dimnames = list(
    Store    = 1:3,
    District = 1:2,
    Measure  = c("NetSales", "Units", "AUR")
  )
)

with_totals <- sales_array %>%
  addmargins(1:2) %>%
  as.table() %>%
  as.data.frame() %>%
  filter(!(District == "Sum" & Store != "Sum")) %>%
  spread(Measure, Freq) %>%
  arrange(District, Store)

Posting this to StackOverflow could also help.


#5

Thanks @nwerth

Thank you for the feedback.


#6

Fwiw, the dev version of data.table recently added some functions by Jan Gorecki for this. I am just following Uwe’s examples from SO:

library(data.table)
DT = data.table(salesData)

rollup(DT, 
  lapply(.SD, sum), 
  by = c("District", "Name"), 
  sets = list("District", "Name")
)[order(District, Name)]

which gives

   District Name NetSales Units       AUR
1:        1   S1     1000    50  20.00000
2:        1   S2     1500   120  12.50000
3:        1   S3     3000    52  57.69231
4:        1   NA     5500   222  90.19231
5:        2   S4     2000    12 166.66667
6:        2   S5     1500    64  23.43750
7:        2   S6     3000    52  57.69231
8:        2   NA     6500   128 247.79647
9:       NA   NA    12000   350 337.98878

I agree with the others that this is an untidy format not really useful for further analysis, but it may be handy for browsing data.

I don’t really know the syntax for these functions well enough, so there is probably a simpler way to write the code above. For those who don’t know data.table:

  • lapply(.SD, sum) is summing all columns not involved in the by= list.
  • [order(x,y,z)] is the same as arrange(x,y,z).

You may find other answers in the SO link helpful as well, since it’s a similar question.