Hi there,
I was trying to answer another post on this forum where I was forced to forget all my old tidy eval knowledge and try to implement the new dplyr 1.0.0 logic.
It didn't go very well and it took me over an hour to wrap my head around it, but I guess that's just because I'm not used to the new functions yet Anyway, I found what appears to be an inconsistency of the
all_of
function when used in different dplyr functions and was wondering if anyone could clarify this.
all_of and select()
myData = data.frame(x = c("A", "B"), y = 1:6)
myColumn = "x"
myData %>% select(all_of(myColumn))
x
1 A
2 B
3 A
4 B
5 A
6 B
Here, the all_of function can be used directly in the select() function as it converts a string (or vector of strings) into column names
all_of and group_by
myData = data.frame(x = c("A", "B"), y = 1:6)
myColumn = "x"
myData %>% group_by(all_of(myColumn)) %>% summarise(y = sum(y))
# A tibble: 1 x 2
`all_of(myColumn)` y
<chr> <int>
1 x 21
This output is not what you'd expect. After searching for a long time and trying out different things, I could fix it by wrapping the all_of in the the across()
function:
myData = data.frame(x = c("A", "B"), y = 1:6)
myColumn = "x"
myData %>% group_by(across(all_of(myColumn))) %>% summarise(y = sum(y))
# A tibble: 2 x 2
x y
<chr> <int>
1 A 9
2 B 12
Can someone explain to me why I need the across wrapper for the group_by(), but not for the select()?
Thanks!
PJ