In the process of trying to pinpoint when ungroup() is important, I realized I may not understand how group_by() works as well as I had thought, especially when it is used in combination with summarize().
In particular, I didn't realize that the order of variables within group_by() matters. It appears that after a summarize(), only the first grouping variable remains grouped. This result was totally unintuitive to me, so I figured I'd ask-- does this make sense to everyone? or is it a bug?
For example, if I adapt the example from this thread:
library(tidyverse)
data.frame(Titanic) %>%
group_by(Class, Age) %>%
summarize(Freq = sum(Freq)) %>%
mutate(Class = reorder(Class, Freq))
#> Error in mutate_impl(.data, dots): Column `Class` can't be modified because it's a grouping variable
#when I switch the order within the group_by(), it works
data.frame(Titanic) %>%
group_by(Age, Class) %>%
summarize(Freq = sum(Freq)) %>%
mutate(Class = reorder(Class, Freq))
#> # A tibble: 8 x 3
#> # Groups: Age [2]
#> Age Class Freq
#> <fct> <fct> <dbl>
#> 1 Child 1st 6
#> 2 Child 2nd 24
#> 3 Child 3rd 79
#> 4 Child Crew 0
#> 5 Adult 1st 319
#> 6 Adult 2nd 261
#> 7 Adult 3rd 627
#> 8 Adult Crew 885
@martin.R is spot on. Any summarize drops the last grouping because otherwise any following operations would be row-wise operations as there are, by definition, only one row per original grouping. But don't feel like you should know this, I agree it's not intuitive. Every few months someone submits this as a bug on the dplyr github site. I had submitted a suggestion that dplyr should throw an information text explaining the dropped grouping. But the executive decision is "this is by design".
When to ungroup is sort of a different question. You can do all sorts of things with grouped data and not have any issue. Until you do. So my really unsophisticated approach is that I ungroup when grouped data does not do what I need it to. Yeah, that's not a very informative heuristic, I realize. My single most common reason for doing an ungroup is that I want to drop a variable that's in the grouping. dplyr will not let us drop a variable that's in a grouping.
Some people get rather prescriptive and make sure that for every group_by there's an associated ungroup as soon as possible. I can't live with that level of fascism, myself. But if others need heavy rules in order to sleep at night, then maybe it's a good idea.
I didn't realise we'd skirt this close to Godwin's Law over ungroup()!
My experience is that not ungrouping has bitten me and leaving data grouped has no advantage unless you explicitly need that grouping in a chain. Other things keep me awake at night instead ...
^ This helps! I hadn't thought of it this way before. I like thinking of group_by(x, y) and ungroup() as markers that say "everything between these two functions is grouped by x and y," but it looks like I need to adjust my mental model in the case of summarize()
At this point, I still think the automated dropping of a group (didn't realize it was specifically the last item, thanks @martin.R) is easy to forget if you're not paying close attention. For whatever reason, keeping all groupings (or even dropping all groupings) after summarize() feels more intuitive.