Create a summary table of counts in a pipe chain

tiernan · October 2, 2017, 4:20pm

Let's say I had a dataset with two categorical variables and I want to create a summary table that shows two things:

The number of instances (counts) of each combination of the categorical variables
The number of instances of one categorical variable by itself

Is there an elegant way of producing this summary table within a pipe workflow? My current approach works but the use of {} feels awkward:

library(dplyr) 

starwars %>% 
  filter(gender %in% c("male", "female"), 
         eye_color %in% c("brown", "blue", "black")) %>% 
  { bind_rows(count(., gender, eye_color), 
              count(., eye_color) %>% mutate(gender = "any")) } %>% 
  arrange(gender, desc(n))

#> # A tibble: 9 x 3
#>   gender eye_color     n
#>   <chr>  <chr>     <int>
#> 1 any    brown        21
#> 2 any    blue         19
#> 3 any    black         9
#> 4 female blue          6
#> 5 female brown         5
#> 6 female black         2
#> 7 male   brown        16
#> 8 male   blue         13
#> 9 male   black         7

cderv · October 2, 2017, 4:41pm

Your way seems ok. This use of the pipe is not so awkward - it is pretty clear when reading if you know the meaning of data %>% {f(.)}

You could do this way using how group_by works when called with several groups.

library(dplyr, warn.conflicts = F) 

starwars %>% 
  filter(gender %in% c("male", "female"), 
         eye_color %in% c("brown", "blue", "black")) %>%
  group_by(eye_color, gender) %>%
  tally() %>%
  bind_rows(summarise(., n = sum(n)) %>% mutate(gender = "any")) %>%
  arrange(gender, desc(n))
#> # A tibble: 9 x 3
#> # Groups:   eye_color [3]
#>   eye_color gender     n
#>       <chr>  <chr> <int>
#> 1     brown    any    21
#> 2      blue    any    19
#> 3     black    any     9
#> 4      blue female     6
#> 5     brown female     5
#> 6     black female     2
#> 7     brown   male    16
#> 8      blue   male    13
#> 9     black   male     7

However, even if { } is not used, I am not sure it is clearer.

dylanjm · October 2, 2017, 4:42pm

Not quite sure how to answer your second problem. Here is an idea:

starwars %>%
  group_by(gender, eye_color) %>%
  filter(gender %in% c("male", "female"), 
          eye_color %in% c("brown", "blue", "black")) %>%
  summarise(n = n())

rkahne · October 2, 2017, 5:31pm

My way to solve that problem involves a join, so I think your {} notation is a significant improvement! I'm actually going to start doing that!

alistaire · October 3, 2017, 2:33am

Honestly, your original is about as clear as you can get, but here's an option with Sam Firke's janitor package:

library(tidyverse)

starwars %>% 
    filter(gender %in% c("male", "female"), 
           eye_color %in% c("brown", "blue", "black")) %>% 
    janitor::crosstab(gender, eye_color) %>% 
    janitor::adorn_totals('row') %>% 
    gather(eye_color, n, -gender) %>% 
    arrange(gender, desc(n))
#>   gender eye_color  n
#> 1 female      blue  6
#> 2 female     brown  5
#> 3 female     black  2
#> 4   male     brown 16
#> 5   male      blue 13
#> 6   male     black  7
#> 7  Total     brown 21
#> 8  Total      blue 19
#> 9  Total     black  9

which mirrors how you could do this in base R with table and addmargins:

library(dplyr)

starwars %>% 
    filter(gender %in% c("male", "female"), 
           eye_color %in% c("brown", "blue", "black")) %>% 
    select(gender, eye_color) %>% 
    table() %>% 
    addmargins(1) %>% 
    as_data_frame() %>% 
    arrange(gender, desc(n))
#> # A tibble: 9 x 3
#>   gender eye_color     n
#>    <chr>     <chr> <dbl>
#> 1 female      blue     6
#> 2 female     brown     5
#> 3 female     black     2
#> 4   male     brown    16
#> 5   male      blue    13
#> 6   male     black     7
#> 7    Sum     brown    21
#> 8    Sum      blue    19
#> 9    Sum     black     9

nutterb · October 3, 2017, 10:32am

You can omit the {} construct by using a mutate in your bind_rows.

starwars %>% 
  filter(gender %in% c("male", "female"), 
         eye_color %in% c("brown", "blue", "black")) %>% 
  
  bind_rows(.,
            # duplicate the data, but recategorize all gender as 'any'
            mutate(., gender = "any")) %>% 
  count(., gender, eye_color) %>% 
  arrange(gender, desc(n))

It's no less awkward than what you've done, but I would recommend including a comment as it isn't a conventional thing to do within a pipe.

alexilliamson · October 3, 2017, 2:27pm

This isn't as concise, but I think an intermediate assignment - like to a 'counts' variable - may help

library(dplyr, warn.conflicts = F) 


counts <- starwars %>% 
  filter(gender %in% c("male", "female"), 
         eye_color %in% c("brown", "blue", "black")) %>%
  count(gender, eye_color) 

counts %>%
  group_by(eye_color) %>%
  summarise(n = sum(n)) %>%
  transmute(gender = 'any', eye_color, n) %>%
  bind_rows(counts)

thoughtfulnz · October 3, 2017, 6:54pm

How about relying on logical to numerical type conversion,

group by eye colour, then

summarise(malen = sum(gender=="male"), femalen = sum(gender=="female"), all = n())

and if you want the data in long format, use gather from tidyr