Summarize by distinct count of another column (Trying to replicate Excel's pivot table feature, 'summarize by distinct count' and 'summarize by distinct count percent of grant total')

FJCC · May 15, 2022, 6:30pm

Getting the information in the pivot table is easy enough. Melding the two types of summaries together takes a little more work and, frankly, I find the combination confusing. For example, the 3 and the 2 in your summary of Brazil by Type do not result in the 4 in the Brazil Total row. And the 5 and the 100% in your Grand Total are not obviously the sum of any of the preceding numbers. However, this gets you most of the way to the final table.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
customer_id <- c('A1', 'A1', 'A1', 'B2', 'B2', 'C3', 'C3', 'D4', 'D4', 'E5')
country <- c('Brazil', 'Brazil', 'Peru', 'Brazil', 'Costa Rica', 'Brazil', 'Peru', 'Brazil', 'Costa Rica', 'Hawaii')
type <- c('Arabica', 'Robusta', 'Excelsa', 'Arabica', 'Robusta', 'Arabica', 'Excelsa','Robusta', 'Excelsa', 'Arabica')

df <- data.frame(customer_id, country, type)
df
#>    customer_id    country    type
#> 1           A1     Brazil Arabica
#> 2           A1     Brazil Robusta
#> 3           A1       Peru Excelsa
#> 4           B2     Brazil Arabica
#> 5           B2 Costa Rica Robusta
#> 6           C3     Brazil Arabica
#> 7           C3       Peru Excelsa
#> 8           D4     Brazil Robusta
#> 9           D4 Costa Rica Excelsa
#> 10          E5     Hawaii Arabica

TotalCust <- length(unique(df$customer_id))

SummCountryType <- df |> group_by(country, type) |> 
  summarize(N = n()) |> 
  mutate(Perc = N/TotalCust)
#> `summarise()` has grouped output by 'country'. You can override using the
#> `.groups` argument.
SummCountryType
#> # A tibble: 6 x 4
#> # Groups:   country [4]
#>   country    type        N  Perc
#>   <chr>      <chr>   <int> <dbl>
#> 1 Brazil     Arabica     3   0.6
#> 2 Brazil     Robusta     2   0.4
#> 3 Costa Rica Excelsa     1   0.2
#> 4 Costa Rica Robusta     1   0.2
#> 5 Hawaii     Arabica     1   0.2
#> 6 Peru       Excelsa     2   0.4

SummaryCountry <- df |> group_by(country) |> 
  summarize(N = length(unique(customer_id))) |> 
  mutate(Perc = N/TotalCust)
SummaryCountry
#> # A tibble: 4 x 3
#>   country        N  Perc
#>   <chr>      <int> <dbl>
#> 1 Brazil         4   0.8
#> 2 Costa Rica     2   0.4
#> 3 Hawaii         1   0.2
#> 4 Peru           2   0.4

SummaryCountry$type <- ""
GlueFunc <- function(Country) {
  tmp1 <- filter(SummCountryType, country == Country)
  tmp2 <- filter(SummaryCountry, country == Country) |> 
    mutate(country = paste(country, "Total"))
  rbind(tmp1, tmp2)
}

COUNTRIES <- unique(df$country)

FINAL <- map_dfr(COUNTRIES, GlueFunc)
FINAL
#> # A tibble: 10 x 4
#> # Groups:   country [8]
#>    country          type          N  Perc
#>    <chr>            <chr>     <int> <dbl>
#>  1 Brazil           "Arabica"     3   0.6
#>  2 Brazil           "Robusta"     2   0.4
#>  3 Brazil Total     ""            4   0.8
#>  4 Peru             "Excelsa"     2   0.4
#>  5 Peru Total       ""            2   0.4
#>  6 Costa Rica       "Excelsa"     1   0.2
#>  7 Costa Rica       "Robusta"     1   0.2
#>  8 Costa Rica Total ""            2   0.4
#>  9 Hawaii           "Arabica"     1   0.2
#> 10 Hawaii Total     ""            1   0.2

^{Created on 2022-05-15 by the reprex package (v2.0.1)}