Use tidyverse to add totals to a pivot data frame

communy · March 22, 2022, 12:48pm

I frequently have to deal with pivot tables and want I way add a total row.
For practicality I prefer to stay within the tidyverse, i.e. not load additional packages (like janitor that has a total function).
Making things a little more tricky, frequently the pivot tables contain missing values (NA).

I calculate the pivot table from larger data frames mainly using the group_by, count and pivot_wider functions.

This sample data frames mimic a such a pivot table:

df <- tribble(~category, ~value1, ~value2,
        "cat1", 12, 6,
        "cat2", 8, 2,
        "cat3", 1, NA,
        "cat4", NA, 6) |> 
  mutate(across(where(is.numeric), as.integer)) # mutate column type to mimic pivot table structure

However, this is just for demonstration - I would like to attach the code to produce a total row directly after group_by, count and pivot_wider.

On stackoverflow I found this solution to make attach a total row...

bind_rows(summarise(.,
                      across(where(is.numeric), sum),
                      across(where(is.character), ~"Total")))

... but it doesn't work - neither the summarise part, nor the bind_rows part.

I modified it make summarizing possible:

df |> 
  summarise(across(where(is.character), ~"Total"),
            across(where(is.numeric), ~sum(., na.rm = T)))
# works to summarise all columns

Combining summarise and bind_rows to attach the totals directly to the data is not possible, an error message related to across is returned:

df |> 
  bind_rows(summarise(across(where(is.character), ~"Total")),
                      across(where(is.numeric), sum(na.rm = TRUE)))
# Error: `across()` must only be used inside dplyr verbs.

How can this problem be solved?

In addition, when combining the summary functions directly after pivoting the original data frame, I also get an error message that the df contains grouped data - which I find strange, since in my understanding it should only use the pivoted data as a source.

nirgrahamuk · March 22, 2022, 1:10pm

bind_rows(
  df,
  summarise(
    df,
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    across(where(is.character), ~"Total")
  )
)

communy · March 22, 2022, 1:14pm

Thank you, but this does solve my problem only partially. I want to add the code with a pipe to the pivot table code originating from the raw data - that why I think is must be:

df |> bind_rows ...

nirgrahamuk · March 22, 2022, 1:28pm

If you want to pipe df into some code that applies your binding and summarisation you can do it

explicitly/with a defined function

domysummary <- function(x){
  bind_rows(
    x,
    summarise(
      x,
      across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
      across(where(is.character), ~"Total")
    )
  )
}

df |> domysummary()

implicitly with an inline function

df |> { x <- . 
  bind_rows(
    x,
    summarise(
      x ,
      across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
      across(where(is.character), ~"Total")
    )
  )}

communy · March 23, 2022, 10:21am

Unfortunately, both your suggestion don't work.

The result is that the complete pivot data frame is appended to itself, only that in the append "duplicates" NA is replaced with 0.

If piping directly after pivoting the original data frame, the error message is:

Error: function '{' not supported in RHS call of a pipe

If piping after the pivot data frame (saved df after pivoting the original data frame, i.e. df |> ...), the error message is:

Error: unexpected '}' in:
"                )
              )}"

nirgrahamuk · March 23, 2022, 10:22am

Sorry, I use traditional %>% magrittr/dplyr pipes, I suppose they may be more flexible than the R4.1 |> pipes, I should not have assumed they would work the same.

communy · March 23, 2022, 10:30am

I just switched RStudio to use %>% instead of |> - doesn't change anything, the error messages are exactly the same.

nirgrahamuk · March 23, 2022, 3:09pm

I tested the code again to confirm it does work for me.
Reposting in full to avoid any copy and paste issues you might face

library(tidyverse)

df <- tribble(~category, ~value1, ~value2,
              "cat1", 12, 6,
              "cat2", 8, 2,
              "cat3", 1, NA,
              "cat4", NA, 6) %>% 
  mutate(across(where(is.numeric), as.integer))

df %>% { x <- . 
bind_rows(
  x,
  summarise(
    x ,
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    across(where(is.character), ~"Total")
  )
)}

results in

# A tibble: 5 x 3
  category value1 value2
  <chr>     <int>  <int>
1 cat1         12      6
2 cat2          8      2
3 cat3          1     NA
4 cat4         NA      6
5 Total        21     14

sessionInfo

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.8     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0    
[7] tibble_3.1.6    ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8       cellranger_1.1.0 pillar_1.7.0     compiler_4.0.5   dbplyr_2.1.1     tools_4.0.5     
 [7] digest_0.6.25    evaluate_0.14    jsonlite_1.7.3   lubridate_1.8.0  lifecycle_1.0.1  gtable_0.3.0    
[13] pkgconfig_2.0.3  rlang_1.0.1      reprex_2.0.1     rstudioapi_0.13  DBI_1.0.0        cli_3.1.1       
[19] yaml_2.2.0       xfun_0.29        haven_2.4.3      fastmap_1.1.0    knitr_1.23       xml2_1.3.3      
[25] withr_2.2.0      httr_1.4.2       fs_1.5.2         generics_0.1.0   vctrs_0.3.8      hms_1.1.1       
[31] grid_4.0.5       tidyselect_1.1.1 glue_1.6.2       R6_2.4.1         fansi_0.4.1      readxl_1.3.1    
[37] rmarkdown_1.13   tzdb_0.1.1       modelr_0.1.8     magrittr_2.0.2   htmltools_0.5.2  backports_1.1.7 
[43] scales_1.1.1     ellipsis_0.3.2   rvest_1.0.2      assertthat_0.2.1 colorspace_1.4-1 utf8_1.1.4      
[49] stringi_1.4.6    munsell_0.5.0    broom_0.7.12     crayon_1.4.2

Andrzej · March 23, 2022, 4:29pm

Hi Nir , could you please elaborate a bit please what is that syntax above, I mean why curly braces, no comma after dot, why do you need x inside that syntax ? Does x consist of all columns of df ?
best,
Andrzej

nirgrahamuk · March 23, 2022, 4:36pm

Hi Andrzej,
Its not in fact necessary, I just did it to put visual emphasis for the benefit of someone that might have less experience, my idea being that the outer curly braces {} show clearly the space in which I'm using the %>%'s . placeholder, but also because . is quite a small mark, I simply assigned it to an easier to read symbol, in this case x.
the following code works


df %>%  
bind_rows(
  .,
  summarise(
    . ,
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    across(where(is.character), ~"Total")
  )
)

Andrzej · March 23, 2022, 4:39pm

Thank you very much for your kind explanation.

communy · March 23, 2022, 8:48pm

Thank you again @nirgrahamuk . Indeed, you code works in RStudio when using %>% to pipe - but not with |> (Error in list2(...) : object '.' not found).

Strangely, it only works with the example df provided above.

But it doesn't work with my original pivot df (which I also used before with your code ) - strange, since it is very similar. The reason must be how it was created - with this code structure:

original_df |> 
  group_by(FstArt, BL) |> 
  count() |> 
  pivot_wider(names_from = BL,
              values_from = n)

The console result is:

# A tibble: 6 × 3
# Groups:   FstArt [6]
  FstArt        BW     B
  <chr>      <int> <int>
1 Einzelfund    12     6
2 Grab           8     2
3 Grab?          1    NA
4 Gräberfeld     2    NA
5 Grabgruppe     6    NA
6 Siedlung      NA     1

However, when I send it to the clipboard and paste it as tribble (clipr, datapasta), the code is

tibble::tribble(
       ~FstArt, ~BW, ~B,
  "Einzelfund", 12L, 6L,
        "Grab",  8L, 2L,
       "Grab?",  1L, NA,
  "Gräberfeld",  2L, NA,
  "Grabgruppe",  6L, NA,
    "Siedlung",  NA, 1L
  )

... and even more strange to me - if I take the resulting tibble the code to add a total row works fine, but not if I use a tibble saved after pivoting the original df.

Any idea?

nirgrahamuk · March 23, 2022, 9:03pm

When you construct your tribble it has no groups, but your problematic frame has groups. You can use ungroup() to solve that.

communy · March 23, 2022, 9:47pm

Thank you very much - with ungroup() it works perfectly.
… how simple solution can be...

system · March 30, 2022, 9:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.