.groups = "drop" in Summarize() functions

Hey everyone, I'm a complete beginner to R. Can someone explain to me what the highlighted part of the code means? Why does it say "drop" when the .groups = "drop" command actually helps include all the groups listed in our results? Thanks.

tg <- ToothGrowth %>%
group_by(supp,dose) %>%
summarize(mean = mean(len),
longest = max(len), .groups = "drop")

The setting .groups = "drop" affects the grouping in the data frame returned by summarize(). The grouping in the data frame coming in to summarize is set by the group_by() function. You can then pick the grouping in the resulting data frame using the .groups argument.

But why does it say "drop"? If we are dropping the groups, how does our function still work and group by both variables?

In your code

tg <- ToothGrowth %>%
group_by(supp,dose) %>%
summarize(mean = mean(len),
longest = max(len), .groups = "drop")

ToothGrowth gets grouped by the columns supp and dose and then it is passed to the summarize function. The columns mean and longest are calculated for all the combinations of supp and dose. The output of summarize() is a data frame with the four columns supp, dose, mean, and longest. Because .groups is set to "drop", that data frame named tg is not grouped by any column. The calculation was performed on the grouped version of ToothGrowth and the grouping was dropped from tg.

Additional comment - Remember that grouping does not change the values in the data frame. It is an attribute of the data frame that tells functions like summarize() and mutate() how to process the data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.