How can I use dplyr group_by in a function?

choi · November 26, 2019, 12:29am

This is an example.

library('tidyverse')

org_dat = tibble( dat = sample( LETTERS[1:4], 100, replace = TRUE ) ,
num = sample( 1:100 , replace = TRUE ) )

subsetting = function( data, col ,var ){
return( data %>%
filter( .[col] == var ) %>%
group_by( .[col] ) %>%
summarise( SUM = sum( num ) ) )
}

subsetting(data = org_dat, col = 'dat' , var = 'A' )
Error: Column .[col] is of unsupported class data.frame

How can I set the group by value to get the following result?

mattwarkentin · November 26, 2019, 12:35am

Hi @choi,

Try this:

org_dat = tibble( dat = sample( LETTERS[1:4], 100, replace = TRUE ) ,
                  num = sample( 1:100 , replace = TRUE ) )
  
subsetting <- function(data, col, var) {
  data %>%
    filter({{col}} == var) %>% 
    group_by({{col}}) %>%
    summarise(MEAN_SUM = sum(num))
  }

subsetting(data = org_dat, col = dat, var = 'A')

Working with dplyr functions can be tricky due to tidy evaluation. Learn more about it here. dplyr functions expect to see unquoted variable names (literally without quotations), so if you want your functions work with dplyr functions, you need to pass it unquoted variable names. To do this, you need to use the special "embrace" operator {{ }} to do some quoting/unquoting magic.

choi · November 26, 2019, 12:39am

Thank you!

cderv · November 26, 2019, 7:10am

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · December 17, 2019, 7:10am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.