How to use filter in a dplyr function call

I am trying to write a function, but the second filter condition
{{var1}} == 1
in this function eliminates all rows. Can someone help with the syntax?

compute_num0 = function(var1) {
  res_num = df1 %>%
    filter(!is.na({{var1}}),  {{var1}} == 1)  %>%  
    group_by(year, city, .drop = FALSE) %>% summarize(counts = n()) %>% 
    select(year,city,counts) %>% as.data.frame()
  res_num 
}

compute_num0('x') # produces all zero counts

This is some sample data

library(tidyverse)

set.seed(2021)
numRows = 1000

df1 = data.frame(year = sample(2010:2018, size = numRows, replace = TRUE),
                 race = sample(c('white', 'black', 'Asian', 'Hispanic'), size = numRows, replace = TRUE),
                 city = sample(c('Oakland', 'Berkeley','Fremont'), size = numRows, replace = TRUE),
                 young = sample(c(1,2,NA), size = numRows, replace = TRUE),
                 old = sample(c(1,2,NA), size = numRows, replace = TRUE),
                 x = sample(x = c(1,2, NA), size = numRows, replace = TRUE, prob = c(.7, .2, .1)),
                 y = sample(x = 1:10, size = numRows, replace = TRUE),
                 z = sample(x = 1:10, size = numRows, replace = TRUE))

df1$year = factor(df1$year)
df1$race = factor(df1$race)
df1$city = factor(df1$city)

suppressPackageStartupMessages({
  library(dplyr)
})


compute_num0 = function(var1) {
  df1 %>%
    filter(!is.na({{var1}})) %>%  
    group_by(year, city, .drop = FALSE) %>% 
    summarize(counts = n()) %>% 
    select(year,city,counts)
}

set.seed(2021)
numRows <- 1000

df1 <- data.frame(
  year = sample(2010:2018, size = numRows, replace = TRUE),
  race = sample(c("white", "black", "Asian", "Hispanic"), size = numRows, replace = TRUE),
  city = sample(c("Oakland", "Berkeley", "Fremont"), size = numRows, replace = TRUE),
  young = sample(c(1, 2, NA), size = numRows, replace = TRUE),
  old = sample(c(1, 2, NA), size = numRows, replace = TRUE),
  x = sample(x = c(1, 2, NA), size = numRows, replace = TRUE, prob = c(.7, .2, .1)),
  y = sample(x = 1:10, size = numRows, replace = TRUE),
  z = sample(x = 1:10, size = numRows, replace = TRUE)
)

df1$year <- factor(df1$year)
df1$race <- factor(df1$race)
df1$city <- factor(df1$city)

compute_num0('x') 
#> `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
#> # A tibble: 27 x 3
#> # Groups:   year [9]
#>    year  city     counts
#>    <fct> <fct>     <int>
#>  1 2010  Berkeley     36
#>  2 2010  Fremont      49
#>  3 2010  Oakland      36
#>  4 2011  Berkeley     23
#>  5 2011  Fremont      38
#>  6 2011  Oakland      28
#>  7 2012  Berkeley     31
#>  8 2012  Fremont      38
#>  9 2012  Oakland      38
#> 10 2013  Berkeley     47
#> # … with 17 more rows

Thanks, but I think you omitted the second test in the filter function, {{var1}} == 1 and that is central to my problem.

I did, purposefully, because the problem was flagged as produces all zero counts and removing {{var1}} == 1 produces all non-zero counts. What is {{var1}} == 1 intended to do?

I guess it's an attempt to filter on factor level with internal integer representation equals 1.
This strikes me as extremely odd, but can be done by casting var 1 to integer or numeric before testing for 1

Thanks, but x is numeric.

Ah OK. Try it without quoting.
x not 'x'

Nir, thank you. That solved the problem.

Now there is another issue. When I only have the !is.na() condition, both function calls produce non-zero results but the call with the quotes produces the wrong answer!


filter(!is.na({{var1}}), {{var1}} == 1) # fails with zeroes as output
filter(!is.na({{var1}})) # produces non-zero output

compute_num0('x') #These two calls produce different answers
compute_num0(x)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.