The following reprex simplifies the issue I am having while working on a fairly complex function. The function has an argument which accepts a dynamic list (in the general sense, not the R sense) of one or more unquoted variables. Following a necessary step of validation and transformation, this "list" of variables becomes a character vector. This vector must be used for various summarization of data. If the vector contains a single variable, using !!sym() works... but not when there are more.
How would you suggest to properly handle this situation?
Thanks
require(tidyverse)
set.seed(12345)
df <- data.frame(
a = rnorm(100),
x = sprintf('x%s', sample(1:3, 100, replace = TRUE)),
y = sprintf('y%s', sample(1:3, 100, replace = TRUE)),
z = sprintf('z%s', sample(1:3, 100, replace = TRUE))
)
f <- function(data, by){
# Assume that by must be validated/filtered and is ultimately transformed into
# a character
by <- data %>% dplyr::select( {{ by }} ) %>% names()
data %>%
dplyr::group_by( !!sym(by) ) %>%
dplyr::summarize(
mean = mean(a)
)
}
df %>% f(by = z)
df %>% f(by = c(y, z))
> set.seed(12345)
> df <- data.frame(
+ a = rnorm(100),
+ x = sprintf('x%s', sample(1:3, 100, replace = TRUE)),
+ y = sprintf('y%s', sample(1:3, 100, replace = TRUE)),
+ z = sprintf('z%s', sample(1:3, 100, replace = TRUE))
+ )
>
> f <- function(data,...){ # need to use ... as we need arbitrary number of vars to group by
+
+ # Assume that by must be validated/filtered and is ultimately transformed into
+ # a character
+ by <- enquos(...,.named = T) #enriched quotations
+
+ data %>%
+ dplyr::group_by( !!!by ) %>% # big bang instead of bang-bang
+ dplyr::summarize(
+ mean = mean(a)
+ )
+
+ }
>
> df %>% f( z)
# A tibble: 3 × 2
z mean
<chr> <dbl>
1 z1 0.190
2 z2 0.439
3 z3 0.176
> df %>% f( y,z)
`summarise()` has grouped output by 'y'. You can override using the `.groups` argument.
# A tibble: 9 × 3
# Groups: y [3]
y z mean
<chr> <chr> <dbl>
1 y1 z1 0.262
2 y1 z2 0.677
3 y1 z3 0.224
4 y2 z1 -0.138
5 y2 z2 -0.200
6 y2 z3 0.112
7 y3 z1 0.505
8 y3 z2 0.620
9 y3 z3 0.169
>
all credit to the solution to the blog link posted in the beginning, it was easy to build the solution using the blog. It has the exact same example that we need in your case.
I read about and tried to use quosures. However, I could not find a way to perform the necessary validation and manipulation of the by argument once it is transformed into a quosure. For instance, how would one filter or re-order the "list" of variables inside the function (It is important to me that this re-ordering is performed inside the function rather than rely on the use to enter the argument in this order)?
For instance, assuming that alphabetic order is necessary (which is an over-simplification of the re-ordering that I need to perform), how one ensures that df %>% f(by = c(z, x, y)) return the same output as df %>% f(by = c(x, y, z)?
the changes from what you had are minal; simply change !!sym(by) to !!!syms(by)
the by names can be sorted beforehand.
see:
require(tidyverse)
set.seed(12345)
df <- data.frame(
a = rnorm(100),
x = sprintf('x%s', sample(1:3, 100, replace = TRUE)),
y = sprintf('y%s', sample(1:3, 100, replace = TRUE)),
z = sprintf('z%s', sample(1:3, 100, replace = TRUE))
)
f <- function(data, by){
# Assume that by must be validated/filtered and is ultimately transformed into
# a character
by <- data %>% dplyr::select( {{ by }} ) %>% names() %>% sort()
data %>%
dplyr::group_by( !!!syms(by) ) %>%
dplyr::summarize(
mean = mean(a)
) %>% ungroup()
}
suppressMessages(
identical(
df %>% f(by = c(y, z)),
df %>% f(by = c(z, y))
)
)