I'd like to make a feature request for a new global option that controls the behavior of dplyr::summarise
with respect to the .groups
argument. There's been some discussion around the .groups
argument before and there's been a lot of frustration with the unnecessary warning messages. I agree with the author here that the default behavior that's most helpful for me is always dropping the grouping after a summarise.
There's two reasons for this. First, I very rarely chain summarise operations one after another. In the rare case that I desire nested calculations with a summarise piped directly into an additional summarise with the last set of groups dropped, I want the new set of groups to be explicit instead of implicit and typically write syntactic sugar of an additional group_by
instead. Secondly, the idea of the order of the groups mattering is simply absurd to me. I don't write my code with a sense of "biggest group first, then the next biggest one, etc.". My groups are instead non-hierarchical and simply categories within which I'd like something calculated. Often times I actually move the opposite direction where I filter down to a single manageable working example to mimic a single group, then remove the filters and add additional groups to the end of the group_by
call once I'm happy with the output. Unfortunately, adding groups to the end of the call without specifying .groups
every time changes the behavior and has caused me more than one headache.
It seems like a relatively friction-free way of handling this would be the creation of a new global option similar to the existing options(dplyr.summarise.inform = FALSE)
that controls this default behavior. That way old code isn't broken but it's a single line intervention (e.g. options(dplyr.summarise.groups = "drop")
) at the top of the script that makes it clear what the behavior is throughout.
I'm aware of the "correct" way to do this by adding .groups="drop"
explicitly to every single summarise
I use as well as workarounds such as ungrouping after each call (unhelpful because the message is still thrown and it's an additional line of code), defining a new summarise_drop
function that implements this behavior, or using the .by
argument instead of the group_by
function. All of these are far more annoying than setting a global variable at the top of my script that controls the default behavior. If folks are aware of other workarounds or solutions to this that don't require a feature-request, please let me know.