Looking for examples displaying data with multi-level totals

cdr6934 · September 26, 2017, 3:20am

[edited for clarity]

I'm looking for examples or ideas on how I might display a data table in a .Rmd document that has grouped totals at multiple levels. So, in the example following, the Net Sales, Units, and AUR have a total at the district level and company level.

Following I just crafted an example to illustrate what I mean:

The screenshot illustrates what I might display to the user and below was my R attempt to get a similar result minus the formating.

library(tidyverse)
library(xtable)

# Create dummy data
salesData <- data.frame(District=c(1,1,1,2,2,2), 
                        Name=c("S1","S2","S3","S4","S5","S6"),
                        NetSales=c(1000,1500,3000,2000,1500,3000),
                        Units=c(50,120,52,12,64,52)) %>%
                        mutate(AUR = NetSales / Units)
                      
# Generates District Totals
salesTotals <- salesData %>% 
  group_by(District) %>% 
  summarize(Name = "Total", 
            NetSales = sum(NetSales),
            Units = sum(Units), 
            AUR = sum(NetSales) / sum(Units) )

# Generates Grand Total
grandTotals <- salesTotals %>%
  summarize(
            District = "", 
            Name = "GrandTotal", 
             NetSales = sum(NetSales),
             Units = sum(Units), 
             AUR = round((sum(NetSales)/sum(Units),2)))

# Binds district totals to frame
totals <- rbind(salesData, salesTotals) %>%
              arrange(District, Name)

#Binds grand total to frame
totals <- rbind(totals, grandTotals) 

#Generate LaTeX table to display 
xtable(totals)

Which generates something like this using xtable

This creates a rather rough table and with some more table formatting in xtable, the table would look presentable.

Is there a tidy verse way of doing this?
How you might go about displaying this data to your users?
Have you had to deal with this sort of display before in other projects?

JidduAlexander · September 26, 2017, 2:37pm

I'd suspect that there isn't really a tidyverse way to do this because your resulting data frame isn't very 'tidyverse'.

As a report I think your summary table can be great, simply because your audience understands and it ma hold exactly the information that they need. However as a table it isn't very tidy. Specifically the 'Name' column, it is a variable indicating a store and 'total' and 'grand total' are not stores. Since the tidyvese functions generally take a tidy data frame and return a tidy data frame I doubt that there is any mainstream functions that do this.

If you need this a lot you could write your own function and pipe your data through it.

salesData <- data.frame(District=as.character(c(1,1,1,2,2,2)), 
                        Name=c("S1","S2","S3","S4","S5","S6"),
                        NetSales=c(1000,1500,3000,2000,1500,3000),
                        Units=c(50,120,52,12,64,52)) %>%
  mutate(AUR = NetSales / Units)

report_salesdata <- function(salesdata) {
  salesdata %>% 
    bind_rows(salesData %>% 
                group_by(District) %>% 
                summarize(Name = "Total", 
                          NetSales = sum(NetSales),
                          Units = sum(Units), 
                          AUR = sum(NetSales) / sum(Units) )
    ) %>% 
    arrange(District, Name) %>% 
    bind_rows(salesData %>% 
                summarize(
                  District = "", 
                  Name = "GrandTotal", 
                  NetSales = sum(NetSales),
                  Units = sum(Units), 
                  AUR = round(sum(NetSales)/sum(Units),2))
    )
}

salesData %>% 
  report_salesdata() %>% 
  xtable()

Best,
Jiddu

cdr6934 · September 26, 2017, 3:57pm

Thanks @JidduAlexander!

My example is definitely not a very tidy thing to do.

I was hoping I've missed a package that would be something like ggplot that produces a customizable table output with multi-level grouping as a feature. I'd imagine (just brainstorming a little) creating an S3 object that includes the data.frame, a summary of the group(s), a grand total, and a table object that includes an HTML or LaTeX output that can be accessed and builds the output.

I'd imagine the syntax would look something like the following I've penciled out.

 <- salesdata %>% 
     ggtable() +
     summarize_by_group(District) +
     summarize_total() + 
     generate_table()

nwerth · September 26, 2017, 5:34pm

It would probably help everyone involved if there was a hard-and-fast idea of how the table would look.

Still, here's an example using arrays and the addmargins() function:

library(dplyr)
library(tidyr)

sales_array <- array(
  c(salesData[["NetSales"]], salesData[["Units"]], salesData[["AUR"]]),
  dim = c(
    Store    = 3,
    District = 2,
    Measure  = 3
  ),
  dimnames = list(
    Store    = 1:3,
    District = 1:2,
    Measure  = c("NetSales", "Units", "AUR")
  )
)

with_totals <- sales_array %>%
  addmargins(1:2) %>%
  as.table() %>%
  as.data.frame() %>%
  filter(!(District == "Sum" & Store != "Sum")) %>%
  spread(Measure, Freq) %>%
  arrange(District, Store)

Posting this to StackOverflow could also help.

cdr6934 · September 26, 2017, 5:54pm

Thanks @nwerth

Thank you for the feedback.

Frank · September 26, 2017, 6:54pm

Fwiw, the dev version of data.table recently added some functions by Jan Gorecki for this. I am just following Uwe's examples from SO:

library(data.table)
DT = data.table(salesData)

rollup(DT, 
  lapply(.SD, sum), 
  by = c("District", "Name"), 
  sets = list("District", "Name")
)[order(District, Name)]

which gives

   District Name NetSales Units       AUR
1:        1   S1     1000    50  20.00000
2:        1   S2     1500   120  12.50000
3:        1   S3     3000    52  57.69231
4:        1   NA     5500   222  90.19231
5:        2   S4     2000    12 166.66667
6:        2   S5     1500    64  23.43750
7:        2   S6     3000    52  57.69231
8:        2   NA     6500   128 247.79647
9:       NA   NA    12000   350 337.98878

I agree with the others that this is an untidy format not really useful for further analysis, but it may be handy for browsing data.

I don't really know the syntax for these functions well enough, so there is probably a simpler way to write the code above. For those who don't know data.table:

lapply(.SD, sum) is summing all columns not involved in the by= list.
[order(x,y,z)] is the same as arrange(x,y,z).

You may find other answers in the SO link helpful as well, since it's a similar question.