declarative basic pivot table construction
Authors: Gina Reynolds
Full Description:
TLDR: Declaratively and incrementally construct simple pivot tables kind of like this
Another version of the following discussion which is probably a bit more compelling exists here
So, this project is about creating summary tables where cells are like-in-kind (counts, sums, averages, proportions) through a declarative interface. The function tidypivot::pivotr makes it possible to 'compose' such simple tables. It should give you a ggplot2-like feel. Which variable should be put on the x axis (columns) and the y axis (rows)? What variable should be reflected in the cells and using what computation?
We will the {tidytitanic} datasets in many of our tabulation examples. GitHub - EvaMaeRey/tidytitanic: flat versions of the r titanic data The tidy_titanic
dataframe has a row for every passenger aboard the titanic, while flat_titanic
aggregates, providing a frequency of passengers with same characteristics (e.g. 3rd Class, Male, Child, Survived).
library(tidytitanic)
head(tidy_titanic, 3)
#> id class sex age survived
#> 3 1 3rd Male Child No
#> 3.1 2 3rd Male Child No
#> 3.2 3 3rd Male Child No
head(flat_titanic, 3)
#> class sex age survived freq
#> 1 1st Male Child No 0
#> 2 2nd Male Child No 0
#> 3 3rd Male Child No 35
Table 1. Let's see how to make a simple cross tab, inputing data and describing layout.
tidy_titanic |> pivotr(rows = sex, cols = survived)
#> # A tibble: 2 × 3
#> sex No Yes
#> <fct> <dbl> <dbl>
#> 1 Male 1364 367
#> 2 Female 126 344
Table 2. Multiple row variables or multiple col variables are allow.
tidy_titanic |> pivotr(rows = c(sex, age), cols = survived)
#> # A tibble: 4 × 4
#> sex age No Yes
#> <fct> <fct> <dbl> <dbl>
#> 1 Male Child 35 29
#> 2 Male Adult 1329 338
#> 3 Female Child 17 28
#> 4 Female Adult 109 316
The default functionality is to count the input dataframes rows, but arguments value, wt, fun, allow for the cells to show many other computations.
Table 3. For example, we can look at percentages using the following specification.
flat_titanic |> pivotr(rows = sex, cols = survived, value = freq, percent = TRUE, within = sex)
#> # A tibble: 2 × 3
#> sex No Yes
#> <fct> <dbl> <dbl>
#> 1 Male 78.8 21.2
#> 2 Female 26.8 73.2
A number of convenience functions based on pivotr help produce informative tables. Here we use more detailed data about the titanic passengers, which is used in Claus Wilke's data visualization book. Names of passengers and precise age of passengers is provided.
passengers <- readr::read_csv("https://raw.githubusercontent.com/clauswilke/dviz.supp/master/data-raw/titanic/Titanic.csv")
head(passengers)
#> # A tibble: 6 × 7
#> ...1 Name PClass Age Sex Survived SexCode
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 1 Allen, Miss Elisabeth Walton 1st 29 fema… 1 1
#> 2 2 Allison, Miss Helen Loraine 1st 2 fema… 0 1
#> 3 3 Allison, Mr Hudson Joshua Creighton 1st 30 male 0 0
#> 4 4 Allison, Mrs Hudson JC (Bessie Wald… 1st 25 fema… 0 1
#> 5 5 Allison, Master Hudson Trevor 1st 0.92 male 1 0
#> 6 6 Anderson, Mr Harry 1st 47 male 1 0
Table 4. We use pivot_example to sample once from the variable 'Name', giving us a specific example for each cell.
passengers |> pivot_example(rows = Survived, cols = Sex, value = Name)
#> # A tibble: 2 × 3
#> Survived female male
#> <dbl> <chr> <chr>
#> 1 0 Solvang, Mrs Lena Jacobsen Meyer, Mr August
#> 2 1 Gibson, Miss Dorothy Williams, Mr Richard Norris II
Table 5. If we want more examples for each cell, pivot_samplen does the trick (there are many missing ages)
passengers |> pivot_samplen(rows = Survived, cols = Sex, value = Age, n = 7)
#> # A tibble: 2 × 3
#> Survived female male
#> <dbl> <chr> <chr>
#> 1 0 NA; 44; 20; NA; 18; 2; NA NA; NA; 28; NA; 19; NA; NA
#> 2 1 22; 5; 59; 12; 13; 26; NA 32; 9; 35; 60; NA; NA; NA
Piped interface!
Moving on to the piped interface - which is a work in progress. The current syntax name choices nods to the ggplot2 as inspiration. A first cut at making these simple pivot table builds pipable is here And in this walk-through I try to capture the table build and evolution experience ...
Table 6. Four grouping variables via piping interface !
ggtable(data = tidytitanic::tidy_titanic) |>
set_rows(c(sex, survived)) |>
set_cols(c(age, class))
## # A tibble: 4 × 9
## # Groups: sex, survived [4]
## sex survived Child_3rd Adult_1st Adult_2nd Adult_3rd Adult_Crew Child_1st
## <fct> <fct> <int> <int> <int> <int> <int> <int>
## 1 Male No 35 118 154 387 670 NA
## 2 Male Yes 13 57 14 75 192 5
## 3 Female No 17 4 13 89 3 NA
## 4 Female Yes 14 140 80 76 20 1
## # ℹ 1 more variable: Child_2nd <int>
Tables 7-10. A family of plots tables using last_table() with |> set_rows()/set_cols()
updates.
ggtable(data = titanic)
## n
## 1 2201
last_table() |>
set_cols(sex)
## # A tibble: 1 × 2
## Male Female
## <int> <int>
## 1 1731 470
last_table() |>
set_rows(survived)
## # A tibble: 2 × 3
## # Groups: survived [2]
## survived Male Female
## <fct> <int> <int>
## 1 No 1364 126
## 2 Yes 367 344
last_table() |>
set_cols(NULL)
## # A tibble: 2 × 2
## # Groups: survived [2]
## survived n
## <fct> <int>
## 1 No 1490
## 2 Yes 711
Table Type: static-HTML
Submission Type: Single Table Example
Table: tablebuildup
Code: GitHub - EvaMaeRey/tidypivot: Declarative group-wise count and compute, describing your target table; mytidytuesday/2024-05-27-r6/r6.Rmd at master · EvaMaeRey/mytidytuesday · GitHub
Language:
Industries: Given the commonness of the task, this project might be of broad interest: BI, government, reporting, sciences, etc.
Packages: tidypivot (which wraps dplyr and tidyr manipulation), R6 towards piping, ggplot2 (syntax, feel)