# Looking for examples displaying data with multi-level totals

[edited for clarity]

I'm looking for examples or ideas on how I might display a data table in a .Rmd document that has grouped totals at multiple levels. So, in the example following, the Net Sales, Units, and AUR have a total at the district level and company level.

Following I just crafted an example to illustrate what I mean:

The screenshot illustrates what I might display to the user and below was my R attempt to get a similar result minus the formating.

``````library(tidyverse)
library(xtable)

# Create dummy data
salesData <- data.frame(District=c(1,1,1,2,2,2),
Name=c("S1","S2","S3","S4","S5","S6"),
NetSales=c(1000,1500,3000,2000,1500,3000),
Units=c(50,120,52,12,64,52)) %>%
mutate(AUR = NetSales / Units)

# Generates District Totals
salesTotals <- salesData %>%
group_by(District) %>%
summarize(Name = "Total",
NetSales = sum(NetSales),
Units = sum(Units),
AUR = sum(NetSales) / sum(Units) )

# Generates Grand Total
grandTotals <- salesTotals %>%
summarize(
District = "",
Name = "GrandTotal",
NetSales = sum(NetSales),
Units = sum(Units),
AUR = round((sum(NetSales)/sum(Units),2)))

# Binds district totals to frame
totals <- rbind(salesData, salesTotals) %>%
arrange(District, Name)

#Binds grand total to frame
totals <- rbind(totals, grandTotals)

#Generate LaTeX table to display
xtable(totals)
``````

Which generates something like this using `xtable`

This creates a rather rough table and with some more table formatting in `xtable`, the table would look presentable.

• Is there a tidy verse way of doing this?
• How you might go about displaying this data to your users?
• Have you had to deal with this sort of display before in other projects?
1 Like

I'd suspect that there isn't really a tidyverse way to do this because your resulting data frame isn't very 'tidyverse'.

As a report I think your summary table can be great, simply because your audience understands and it ma hold exactly the information that they need. However as a table it isn't very tidy. Specifically the 'Name' column, it is a variable indicating a store and 'total' and 'grand total' are not stores. Since the tidyvese functions generally take a tidy data frame and return a tidy data frame I doubt that there is any mainstream functions that do this.

If you need this a lot you could write your own function and pipe your data through it.

``````salesData <- data.frame(District=as.character(c(1,1,1,2,2,2)),
Name=c("S1","S2","S3","S4","S5","S6"),
NetSales=c(1000,1500,3000,2000,1500,3000),
Units=c(50,120,52,12,64,52)) %>%
mutate(AUR = NetSales / Units)

report_salesdata <- function(salesdata) {
salesdata %>%
bind_rows(salesData %>%
group_by(District) %>%
summarize(Name = "Total",
NetSales = sum(NetSales),
Units = sum(Units),
AUR = sum(NetSales) / sum(Units) )
) %>%
arrange(District, Name) %>%
bind_rows(salesData %>%
summarize(
District = "",
Name = "GrandTotal",
NetSales = sum(NetSales),
Units = sum(Units),
AUR = round(sum(NetSales)/sum(Units),2))
)
}

salesData %>%
report_salesdata() %>%
xtable()
``````

Best,
Jiddu

2 Likes

Thanks @JidduAlexander!

My example is definitely not a very tidy thing to do.

I was hoping I've missed a package that would be something like ggplot that produces a customizable table output with multi-level grouping as a feature. I'd imagine (just brainstorming a little) creating an S3 object that includes the data.frame, a summary of the group(s), a grand total, and a table object that includes an HTML or LaTeX output that can be accessed and builds the output.

I'd imagine the syntax would look something like the following I've penciled out.

`````` <- salesdata %>%
ggtable() +
summarize_by_group(District) +
summarize_total() +
generate_table()
``````
1 Like

It would probably help everyone involved if there was a hard-and-fast idea of how the table would look.

Still, here's an example using arrays and the `addmargins()` function:

``````library(dplyr)
library(tidyr)

sales_array <- array(
c(salesData[["NetSales"]], salesData[["Units"]], salesData[["AUR"]]),
dim = c(
Store    = 3,
District = 2,
Measure  = 3
),
dimnames = list(
Store    = 1:3,
District = 1:2,
Measure  = c("NetSales", "Units", "AUR")
)
)

with_totals <- sales_array %>%
addmargins(1:2) %>%
as.table() %>%
as.data.frame() %>%
filter(!(District == "Sum" & Store != "Sum")) %>%
spread(Measure, Freq) %>%
arrange(District, Store)
``````

Posting this to StackOverflow could also help.

Thanks @nwerth

Thank you for the feedback.

Fwiw, the dev version of data.table recently added some functions by Jan Gorecki for this. I am just following Uwe's examples from SO:

``````library(data.table)
DT = data.table(salesData)

rollup(DT,
lapply(.SD, sum),
by = c("District", "Name"),
sets = list("District", "Name")
)[order(District, Name)]
``````

which gives

``````   District Name NetSales Units       AUR
1:        1   S1     1000    50  20.00000
2:        1   S2     1500   120  12.50000
3:        1   S3     3000    52  57.69231
4:        1   NA     5500   222  90.19231
5:        2   S4     2000    12 166.66667
6:        2   S5     1500    64  23.43750
7:        2   S6     3000    52  57.69231
8:        2   NA     6500   128 247.79647
9:       NA   NA    12000   350 337.98878
``````

I agree with the others that this is an untidy format not really useful for further analysis, but it may be handy for browsing data.

I don't really know the syntax for these functions well enough, so there is probably a simpler way to write the code above. For those who don't know data.table:

• `lapply(.SD, sum)` is summing all columns not involved in the `by=` list.
• `[order(x,y,z)]` is the same as `arrange(x,y,z)`.

You may find other answers in the SO link helpful as well, since it's a similar question.

1 Like