I use group_by
multiple times in my code, which is great because it's very useful! However, with great powers come great responsibilities and I am responsible for ungroup
ing the tibble, which I sometimes forget to do. Functions such add_count
and add_tally
are excellent in this respect because they free the user from the burden of remembering to ungroup
every time.
In my use case, I often need to summarize my dataframe by retaining only the first or the last element of each group, i.e., my summary function is slice
. Example:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
# generate sample data
n <- 100
ngroups <- 10
my_df <- tibble(x = runif(n*ngroups),
y = rnorm(n*ngroups),
group = rep(LETTERS[1:ngroups], each = n)
)
# slice
my_df %<>%
group_by(group) %>%
slice(1) %>%
ungroup
my_df
#> # A tibble: 10 x 3
#> x y group
#> <dbl> <dbl> <chr>
#> 1 0.136 0.647 A
#> 2 0.606 1.22 B
#> 3 0.919 -0.712 C
#> 4 0.0421 0.634 D
#> 5 0.199 -0.229 E
#> 6 0.413 -0.343 F
#> 7 0.699 -0.750 G
#> 8 0.725 -0.183 H
#> 9 0.722 -0.172 I
#> 10 0.158 -1.13 J
Created on 2018-08-25 by the reprex package (v0.2.0).
As you can see, I used the group_by
+ slice()
+ ungroup
pattern.
Question: is there a dplyr
function which corresponds to this pattern? If not, is there some useful trick to forget to ungroup
in such a situation? Of course my real use case is much more complex, i.e., the function is longer and not always based on pipes (I don't use pipes for very long dplyr
workflows, or for functions which need to be called a large number or times).