embrace operator for tidy selection vs data masking

Andrew · September 7, 2023, 1:46pm

I've been wondering about the embrace operator {{...}} and how it behaves differently in tidy selection and data masking contexts.

For dplyr verbs that use tidy selection I can write a function using the the embrace operator and it will accept unquoted or quoted column names or an embraced variable name as arguments. For example:

library(dplyr)

f <- function(df, col) {
  range(pull(df, {{col}}))
}

f(mtcars, disp)
#> [1]  71.1 472.0
f(mtcars, "disp")
#> [1]  71.1 472.0

var <- "disp"
f(mtcars, {{var}})
#> [1]  71.1 472.0

all give the same result.

However, in a data masking context, only the non quoted column name works:

f <- function(df, col) {
  df <- mutate(df, new = {{col}} * 100)
  tibble(df[1,])
}

f(mtcars, disp)
#> # A tibble: 1 × 12
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   new
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 16000
f(mtcars, "disp")
#> Error in `mutate()`:
#> ℹ In argument: `new = "disp" * 100`.
#> Caused by error in `"disp" * 100`:
#> ! non-numeric argument to binary operator

var <- "disp"
f(mtcars, {{var}})
#> Error in `mutate()`:
#> ℹ In argument: `new = "disp" * 100`.
#> Caused by error in `"disp" * 100`:
#> ! non-numeric argument to binary operator

Presumably this is because the {{col}} isn't unquoting when it's doing the mutate.

I can get around this by using rlang::ensym bang-bang, which works the same in both tidy selection

f <- function(df, col) {
  col <- rlang::ensym(col)
  range(pull(df, !!col))
}

f(mtcars, disp)
#> [1]  71.1 472.0
f(mtcars, "disp")
#> [1]  71.1 472.0

var <- "disp"
f(mtcars, {{var}})
#> [1]  71.1 472.0

and data masking

f <- function(df, col) {
  col <- rlang::ensym(col)
  df <- mutate(df, new = !!col * 100)
  tibble(df[1,])
}

f(mtcars, disp)
#> # A tibble: 1 × 12
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   new
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 16000
f(mtcars, "disp")
#> # A tibble: 1 × 12
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   new
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 16000

var <- "disp"
f(mtcars, {{var}})
#> # A tibble: 1 × 12
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb   new
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4 16000

contexts.

So, for me it seems the best option when writing functions that call dplyr verbs or ggplot aesthetics is to use the rlang::ensym() bang-bang approach. However, ensym isn't mentioned at all in the Programming with dplyr or the Using ggplot2 in Packages vignettes, and while it seems like a "catch-all" solution (with the added bonus that I can use rlang::as_name() to use the argument in standard evaluation functions), I feel like I must be missing something when other approaches are given more airtime and the rlang documentation states that " expr() , enquo() , and enquos() are sufficient for most purposes but rlang provides these other operations, either for completeness or because they are useful to experts", and I'm not sure I'm much an expert in this area!

Does anyone else have any thoughts on this?

technocrat · September 7, 2023, 11:47pm

This illustrates one of the reasons I have fallen away from the tidy dialect—it can be very difficult to figure out what is happening in cases like you illustrate. (The main reason is the weight of syntax makes it difficult to reason about how things should work.)

f can be modify to work as it should with a touch of base:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

f <- function(df, col) {
  if(is.null(df$col)) {
    df = mutate(df, new = df$col)
  }
  else {
    df <- mutate(df, new = col * 100)
  }
  tibble(df[1,])
}

f(mtcars, disp)
#> # A tibble: 1 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4
f(mtcars, "disp")
#> # A tibble: 1 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    21     6   160   110   3.9  2.62  16.5     0     1     4     4

^{Created on 2023-09-07 with reprex v2.0.2}

Andrew · September 8, 2023, 6:50am

I'm not sure that's correct. In neither case does the "new" column appear, because df$col is always NULL. Replacing df$col with df[[col]] would be the way to go in base R, but in that case the if statement wouldn't work where col is an unquoted column name. And then you're still left with the problem of how to handle col in the mutate()operation.

I guess my point was that on the surface, the rlang::ensym() plus bang-bang (!!) approach appears to work in all 3 cases (unquoted column name, column name as a string, or embraced string variable for column name) regardless of whether the next step in my function calls a function that uses data masking or tidy selection. However, it seems you have to dig quite deep into Hadley's Advanced R to find an example using it, whereas the more visible documentation doesn't really mention it. This kind of makes me think I must be missing something important! However, I think re-reading that section of the book has confirmed for me that this is the approach that suits the functions I'm writing best.

technocrat · September 8, 2023, 10:16am

I thought the intention was for the function to work whether or not the column name was quoted. I wasn't trying to cover use of the embrace

system · September 29, 2023, 10:17am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.