declarative basic pivot table construction |> Table Contest

evamaerey · June 15, 2024, 6:20pm

declarative basic pivot table construction

Authors: Gina Reynolds

Full Description:
TLDR: Declaratively and incrementally construct simple pivot tables kind of like this

Another version of the following discussion which is probably a bit more compelling exists here

So, this project is about creating summary tables where cells are like-in-kind (counts, sums, averages, proportions) through a declarative interface. The function tidypivot::pivotr makes it possible to 'compose' such simple tables. It should give you a ggplot2-like feel. Which variable should be put on the x axis (columns) and the y axis (rows)? What variable should be reflected in the cells and using what computation?

We will the {tidytitanic} datasets in many of our tabulation examples. GitHub - EvaMaeRey/tidytitanic: flat versions of the r titanic data The tidy_titanic dataframe has a row for every passenger aboard the titanic, while flat_titanic aggregates, providing a frequency of passengers with same characteristics (e.g. 3rd Class, Male, Child, Survived).

library(tidytitanic)

head(tidy_titanic, 3)
#>     id class  sex   age survived
#> 3    1   3rd Male Child       No
#> 3.1  2   3rd Male Child       No
#> 3.2  3   3rd Male Child       No

head(flat_titanic, 3)
#>   class    sex   age survived freq
#> 1   1st   Male Child       No    0
#> 2   2nd   Male Child       No    0
#> 3   3rd   Male Child       No   35

Table 1. Let's see how to make a simple cross tab, inputing data and describing layout.

tidy_titanic |> pivotr(rows = sex, cols = survived)
#> # A tibble: 2 × 3
#>   sex       No   Yes
#>   <fct>  <dbl> <dbl>
#> 1 Male    1364   367
#> 2 Female   126   344

Table 2. Multiple row variables or multiple col variables are allow.

tidy_titanic |> pivotr(rows = c(sex, age), cols = survived)
#> # A tibble: 4 × 4
#>   sex    age      No   Yes
#>   <fct>  <fct> <dbl> <dbl>
#> 1 Male   Child    35    29
#> 2 Male   Adult  1329   338
#> 3 Female Child    17    28
#> 4 Female Adult   109   316

The default functionality is to count the input dataframes rows, but arguments value, wt, fun, allow for the cells to show many other computations.

Table 3. For example, we can look at percentages using the following specification.

flat_titanic |> pivotr(rows = sex, cols = survived, value = freq, percent = TRUE, within = sex)
#> # A tibble: 2 × 3
#>   sex       No   Yes
#>   <fct>  <dbl> <dbl>
#> 1 Male    78.8  21.2
#> 2 Female  26.8  73.2

A number of convenience functions based on pivotr help produce informative tables. Here we use more detailed data about the titanic passengers, which is used in Claus Wilke's data visualization book. Names of passengers and precise age of passengers is provided.

passengers <- readr::read_csv("https://raw.githubusercontent.com/clauswilke/dviz.supp/master/data-raw/titanic/Titanic.csv")

head(passengers)
#> # A tibble: 6 × 7
#>    ...1 Name                                 PClass   Age Sex   Survived SexCode
#>   <dbl> <chr>                                <chr>  <dbl> <chr>    <dbl>   <dbl>
#> 1     1 Allen, Miss Elisabeth Walton         1st    29    fema…        1       1
#> 2     2 Allison, Miss Helen Loraine          1st     2    fema…        0       1
#> 3     3 Allison, Mr Hudson Joshua Creighton  1st    30    male         0       0
#> 4     4 Allison, Mrs Hudson JC (Bessie Wald… 1st    25    fema…        0       1
#> 5     5 Allison, Master Hudson Trevor        1st     0.92 male         1       0
#> 6     6 Anderson, Mr Harry                   1st    47    male         1       0

Table 4. We use pivot_example to sample once from the variable 'Name', giving us a specific example for each cell.

passengers |> pivot_example(rows = Survived, cols = Sex, value = Name)
#> # A tibble: 2 × 3
#>   Survived female                     male                          
#>      <dbl> <chr>                      <chr>                         
#> 1        0 Solvang, Mrs Lena Jacobsen Meyer, Mr August              
#> 2        1 Gibson, Miss Dorothy       Williams, Mr Richard Norris II

Table 5. If we want more examples for each cell, pivot_samplen does the trick (there are many missing ages)

passengers |> pivot_samplen(rows = Survived, cols = Sex, value = Age, n = 7) 
#> # A tibble: 2 × 3
#>   Survived female                    male                      
#>      <dbl> <chr>                     <chr>                     
#> 1        0 NA; 44; 20; NA; 18; 2; NA NA; NA; 28; NA; 19; NA; NA
#> 2        1 22; 5; 59; 12; 13; 26; NA 32; 9; 35; 60; NA; NA; NA

Piped interface!

Moving on to the piped interface - which is a work in progress. The current syntax name choices nods to the ggplot2 as inspiration. A first cut at making these simple pivot table builds pipable is here And in this walk-through I try to capture the table build and evolution experience ...

Table 6. Four grouping variables via piping interface !

ggtable(data = tidytitanic::tidy_titanic) |>
  set_rows(c(sex, survived)) |>
  set_cols(c(age, class))

## # A tibble: 4 × 9
## # Groups:   sex, survived [4]
##   sex    survived Child_3rd Adult_1st Adult_2nd Adult_3rd Adult_Crew Child_1st
##   <fct>  <fct>        <int>     <int>     <int>     <int>      <int>     <int>
## 1 Male   No              35       118       154       387        670        NA
## 2 Male   Yes             13        57        14        75        192         5
## 3 Female No              17         4        13        89          3        NA
## 4 Female Yes             14       140        80        76         20         1
## # ℹ 1 more variable: Child_2nd <int>

Tables 7-10. A family of ~~plots~~ tables using last_table() with |> set_rows()/set_cols() updates.


ggtable(data = titanic)
##      n
## 1 2201

last_table() |>
  set_cols(sex)
## # A tibble: 1 × 2
##    Male Female
##   <int>  <int>
## 1  1731    470

last_table() |>
  set_rows(survived)
## # A tibble: 2 × 3
## # Groups:   survived [2]
##   survived  Male Female
##   <fct>    <int>  <int>
## 1 No        1364    126
## 2 Yes        367    344

last_table() |>
  set_cols(NULL)
## # A tibble: 2 × 2
## # Groups:   survived [2]
##   survived     n
##   <fct>    <int>
## 1 No        1490
## 2 Yes        711

Table Type: static-HTML
Submission Type: Single Table Example
Table: tablebuildup
Code: GitHub - EvaMaeRey/tidypivot: Declarative group-wise count and compute, describing your target table; mytidytuesday/2024-05-27-r6/r6.Rmd at master · EvaMaeRey/mytidytuesday · GitHub
Language:
Industries: Given the commonness of the task, this project might be of broad interest: BI, government, reporting, sciences, etc.
Packages: tidypivot (which wraps dplyr and tidyr manipulation), R6 towards piping, ggplot2 (syntax, feel)