How to properly and dynamically use variables in dplyr pipelines?

nirgrahamuk · January 6, 2023, 11:12am

the changes from what you had are minal; simply change
!!sym(by) to !!!syms(by)
the by names can be sorted beforehand.
see:

require(tidyverse)
set.seed(12345)
df <- data.frame(
  a = rnorm(100),
  x = sprintf('x%s', sample(1:3, 100, replace = TRUE)),
  y = sprintf('y%s', sample(1:3, 100, replace = TRUE)),
  z = sprintf('z%s', sample(1:3, 100, replace = TRUE))
)

f <- function(data, by){
  
  # Assume that by must be validated/filtered and is ultimately transformed into
  # a character
  by <- data %>% dplyr::select( {{ by }} ) %>% names() %>% sort()
  
  data %>% 
    dplyr::group_by( !!!syms(by) ) %>% 
    dplyr::summarize(
      mean = mean(a)
    ) %>% ungroup()
  
}

suppressMessages(
  identical(
    df %>% f(by = c(y, z)),
    df %>% f(by = c(z, y))
  )
)