Is ungroup() recommended after every group_by()?

jtr13 · February 16, 2018, 1:21am

I once had a problem that was solved with ungroup() so I started using it all the time, but wondering if it's really necessary. Would love to hear what others do.

jasonparker · February 16, 2018, 4:39am

I tend to use ungroup() after every group_by() for a few reasons:

Avoid potential unintended errors due to the grouping.
Makes pipes more readable by explicitly pointing out places where the data is being operated on according to groups.
I like to save transformed datasets as .Rdata objects to speed up loading times for scripts I run often and Shiny apps. The groupings are retained in such objects. By ensuring that I always ungroup, I avoid situations where I load an .Rdata object a year later and struggle with a problem not realizing a grouping has been applied.

jtr13 · February 16, 2018, 3:16pm

Hi,

That's very helpful... I will continue to ungroup. I was going to ask if you had an example in which not doing so caused a problem but I answered my own question (by chance). Here's a MWE that reproduces the error I got:

> data.frame(Titanic) %>% 
     group_by(Class, Age) %>% 
     summarize(Freq = sum(Freq)) %>% 
     mutate(Class = reorder(Class, Freq))
Error in mutate_impl(.data, dots) : 
  Column `Class` can't be modified because it's a grouping variable

Note that it doesn't happen with just one group_by variable since summarize() removes the last grouping variable:

> data.frame(Titanic) %>% 
     group_by(Class) %>% 
     summarize(Freq = sum(Freq)) %>%  
     mutate(Class = reorder(Class, Freq))
# A tibble: 4 x 2
  Class  Freq
  <fct> <dbl>
1 1st     325
2 2nd     285
3 3rd     706
4 Crew    885

EDIT (in response to @danr's post): To sum up the context: I am not asking for help debugging this code. I know that the problem is that I didn't ungroup(). The point is to illustrate why it's important to use ungroup().

danr · February 16, 2018, 4:22pm

group_by adds metadata to a data.frame that marks how rows should be grouped. As long as that metadata is there you won't be able to change the factors of the columns involved in the grouping. See the following examples.

You should use a reproducible example for your code. See:

https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex

As is with your code it isn't possible to tell is you meant to use plyr::summarize or dplyr::summarize.

Also a reprex makes it possible for us to just copy paste you code and be able to run it in the same environment that you did. Everyone here is answering questions on their own time so we ask that you do what you can to minimize that time... a reprex is the best way to do that.

suppressPackageStartupMessages(library(dplyr))

# first of all dplyr::group_by adds meta-data to
# the data.frame that other functions, like 
# dplry::summaraize use when the do calculations

t1 <- data.frame(Titanic) %>%
   group_by(Class, Age)

# notice that the meta-data show how rows
# should be grouped
str(t1)
#> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  32 obs. of  5 variables:
#>  $ Class   : Factor w/ 4 levels "1st","2nd","3rd",..: 1 2 3 4 1 2 3 4 1 2 ...
#>  $ Sex     : Factor w/ 2 levels "Male","Female": 1 1 1 1 2 2 2 2 1 1 ...
#>  $ Age     : Factor w/ 2 levels "Child","Adult": 1 1 1 1 1 1 1 1 2 2 ...
#>  $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ Freq    : num  0 0 35 0 0 0 17 0 118 154 ...
#>  - attr(*, "vars")= chr  "Class" "Age"
#>  - attr(*, "drop")= logi TRUE
#>  - attr(*, "indices")=List of 8
#>   ..$ : int  0 4 16 20
#>   ..$ : int  8 12 24 28
#>   ..$ : int  1 5 17 21
#>   ..$ : int  9 13 25 29
#>   ..$ : int  2 6 18 22
#>   ..$ : int  10 14 26 30
#>   ..$ : int  3 7 19 23
#>   ..$ : int  11 15 27 31
#>  - attr(*, "group_sizes")= int  4 4 4 4 4 4 4 4
#>  - attr(*, "biggest_group_size")= int 4
#>  - attr(*, "labels")='data.frame':   8 obs. of  2 variables:
#>   ..$ Class: Factor w/ 4 levels "1st","2nd","3rd",..: 1 1 2 2 3 3 4 4
#>   ..$ Age  : Factor w/ 2 levels "Child","Adult": 1 2 1 2 1 2 1 2
#>   ..- attr(*, "vars")= chr  "Class" "Age"
#>   ..- attr(*, "drop")= logi TRUE