How to pass a filtering criteria to a function?

kbzsl · October 3, 2019, 6:54am

Hi,

I have some issue to pass a filtering criteria to a function by using the dots. The first example (ex1) is working fine. But the second one (ex2) not. How can I pass the filtering criteria (from the dots) to inside of the summarize function embedded in my function?
Do you have any other suggestion to solve this problem?

Thank you for your help.

library(tidyverse)

dfex <- tibble(cat = sample(c("A", "B", "C", NA_character_), size = 1000, replace = TRUE),
               subcat = sample(c(letters, NA_character_), size = 1000, replace = TRUE))

# ex1, OK

dfex %>% 
  filter(is.na(subcat)) %>% 
  count(cat, sort = TRUE) %>% 
  mutate(p = n / sum(n))
#> # A tibble: 4 x 3
#>   cat       n     p
#>   <chr> <int> <dbl>
#> 1 C        11 0.367
#> 2 B        10 0.333
#> 3 <NA>      5 0.167
#> 4 A         4 0.133

fraction1 <- function(df, group, ...){
  group = enquo(group)
  df %>% 
    filter(...) %>% 
    count(!! group, sort = TRUE) %>% 
    mutate(p = n / sum(n))
}

dfex %>% fraction1(cat, is.na(subcat))
#> # A tibble: 4 x 3
#>   cat       n     p
#>   <chr> <int> <dbl>
#> 1 C        11 0.367
#> 2 B        10 0.333
#> 3 <NA>      5 0.167
#> 4 A         4 0.133
dfex %>% fraction1(cat, subcat == "x")
#> # A tibble: 4 x 3
#>   cat       n     p
#>   <chr> <int> <dbl>
#> 1 <NA>     13 0.351
#> 2 C        12 0.324
#> 3 B         8 0.216
#> 4 A         4 0.108

# ex2, NOT OK

dfex %>% 
  group_by(cat) %>% 
  summarise(n_condition = sum(is.na(subcat)),
            n = n()) %>% 
  mutate(p = n_condition / n) %>% 
  arrange(desc(p))
#> # A tibble: 4 x 4
#>   cat   n_condition     n      p
#>   <chr>       <int> <int>  <dbl>
#> 1 B              10   244 0.0410
#> 2 C              11   276 0.0399
#> 3 <NA>            5   239 0.0209
#> 4 A               4   241 0.0166

fraction2 <- function(df, group, ...){
  group = enquo(group)
  df %>% 
    group_by(!! group) %>% 
    summarise(n_condition = sum(...),
              n = n()) %>% 
    mutate(p = n_condition / n) %>% 
    arrange(desc(p))
}

dfex %>% fraction2(cat, is.na(subcat))
#> Error: object 'subcat' not found

mishabalyasin · October 3, 2019, 11:00am

You could do it this way, which, arguably, is even easier to modify:

library(tidyverse)

dfex <- tibble(cat = sample(c("A", "B", "C", NA_character_), size = 1000, replace = TRUE),
               subcat = sample(c(letters, NA_character_), size = 1000, replace = TRUE))

fraction2 <- function(df, group, ...){
  group = enquo(group)
  df %>% 
    group_by(!! group) %>% 
    summarise(...,
              n = n()) %>% 
    mutate(p = n_condition / n) %>% 
    arrange(desc(p))
}

dfex %>% fraction2(cat, n_condition = sum(is.na(subcat)))
#> # A tibble: 4 x 4
#>   cat   n_condition     n      p
#>   <chr>       <int> <int>  <dbl>
#> 1 B              10   238 0.0420
#> 2 A               9   243 0.0370
#> 3 <NA>            8   250 0.032 
#> 4 C               5   269 0.0186

^{Created on 2019-10-03 by the reprex package (v0.3.0)}

kbzsl · October 3, 2019, 5:10pm

Thank you for your suggestion. It is a perfect solution and it is solving my problem, but I have some concerns(?).

I have to include the definition of the n_condition variable in the function calling part, but my original intention was to define this variable only inside the function. I consider that logically the summing (when the condition is met) should be part of the function definition.
Still I am learning the tidyeval; sometimes I feel that the learning curve is a bit steep. I had the hope, that by solving my original problem it can help me understanding better the working of the tidyeval/closures/environments ...

mishabalyasin · October 3, 2019, 5:58pm

Yes, you can do it even with your formulation like this:

library(tidyverse)

dfex <- tibble(cat = sample(c("A", "B", "C", NA_character_), size = 1000, replace = TRUE),
               subcat = sample(c(letters, NA_character_), size = 1000, replace = TRUE))

fraction2 <- function(df, group, ...){
  group = enquo(group)
  condition <- enquos(...)
  df %>% 
    group_by(!! group) %>% 
    summarise(n_condition = sum(!!!condition),
              n = n()) %>% 
    mutate(p = n_condition / n) %>% 
    arrange(desc(p))
}

dfex %>% fraction2(cat, is.na(subcat))
#> # A tibble: 4 x 4
#>   cat   n_condition     n      p
#>   <chr>       <int> <int>  <dbl>
#> 1 C               9   270 0.0333
#> 2 A               8   256 0.0312
#> 3 B               7   226 0.0310
#> 4 <NA>            7   248 0.0282

^{Created on 2019-10-03 by the reprex package (v0.3.0)}

However, I do want to mention that this way of doing things is rather obscure and is likely to lead to confusion down the line. Obviously, this is an example/approximation of your real problem, so it's difficult to say for certain whether that makes it difficult or not for your use-case specifically. Just the word of caution

kbzsl · October 4, 2019, 1:05pm

Thank you for your help and recommendations.

I still wondering, how to decide when is enough to use only the dots and when the quotation is required. In case of first example both solution are working.

fraction1 <- function(df, group, ...){
  group = enquo(group)
  df %>% 
    filter(...) %>%

vs

fraction1 <- function(df, group, ...){
  group = enquo(group)
  condition <- enquos(...)
  df %>% 
    filter(!!!condition) %>%

But if I interpret correctly Hadley’s answer, even in the first case the tidyeval is working behind the scenes.

hadley · October 4, 2019, 1:23pm

There is no reason to use the second form unless you're going to compute on condition in some way.

system · October 11, 2019, 1:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.