Dropping multiple columns with tidyeval (not so painful)

I've read Tidyeval: Dropping a column using Select( ) within a function? but I'm interested in dropping multiple columns.

My problem with the approach in SO: Remove columns the tidyeval way is that passing and manipulating parameters as strings is not really what tidy eval is about. I want to manipulate lists of symbols directly.

Let's assume we've loaded the right stuff and want to drop columns in a table programmatically (i.e., by reference).

library(rlang)
library(dplyr)

Referring to a single quoted variable is easy once you get your head around it.

grpdim <- quo(Sepal.Length)
iris %>% select(!!grpdim) # keeps Sepal.Length

And for reasons I don't entirely understand select() will accept the negation of a single quosure:

iris %>% select(-!!grpdim) # drops Sepal.Length

Now suppose we have a list of variables.
I think the idiomatic way to store these would be a list of quosures. I do this often now to avoid duplicating lists of grouping variables. And it's straightforward to select() columns by unquoting this list.

grpdims <- quos(Species, Sepal.Length)
iris %>% select(!!!grpdims) # keeps Species, Sepal.Length

But the negation does not work as expected.

iris %>% select(-!!!grpdims) # keeps Petal.Width (?!)

I found a syntax that gives the desired result but it's quite nasty.

iris %>% select(!!!lapply(lapply(grpdims, quo_expr), function(x) quo(-!!x)))

To understand what's happening here look at these partial results and then please suggest something better.

lapply(grpdims, quo_expr) # a list of bare names
lapply(lapply(grpdims, quo_expr), function(x) quo(-!!x)) # a list of negated quosures
1 Like

Fwiw, this works: select(iris, -c(!!!grpdims))

Maybe the reasoning for c(!!!v) is similar to why one does list(...) inside a function.

4 Likes

Brilliant! I feel like I need to change the title from "painful" to "not obvious to me".

The next version has full support of character vectors so this will work:

vars <- c("cyl", "am")
select(mtcars, - !! vars)

In fact the following would also work:

select(mtcars -vars)

Except when the data frame has a vars columns, which is why it's preferable to unquote it early.

Edit: Forgot to add that only selecting functions will support character vectors this way. For other kinds of functions that would be ambiguous because strings are valid input, e.g. mutate(mtcars, "foo") creates a new column by recycling "foo". Because of this ambiguity you need to unquote expressions in the general case.

5 Likes

I guess I'm confused about the general approach one should take to handling lists of variables. I've been moving towards quos(var1, var2) rather than c("var1", "var2"). Is that still the preferred approach?

Yes it is fine to create expressions containing column names, this is the approach that will work most generally. The support for character vectors in select() is for convenience.

Frank's answer is perfectly valid. Here's the output when we wrap it with expr() to examine the expansion:

vars <- syms(c("a", "b", "c"))
expr(-c(!!! vars))
#> -c(a, b, c)

Got it. Just to document here, when I start with quosures rather than strings the expression I pass looks like this.

qars <- quos(a, b, c)
expr(-c(!!! qars))
#> -c(~a, ~b, ~c)

Right, here are the three possibilities:

# Strings
expr(-c(!!! c("a", "b", "c")))
#> -c("a", "b", "c")

# Quoted symbols
expr(-c(!!! syms(c("a", "b", "c"))))
#> -c(a, b, c)

# Quosured symbols
expr(-c(!!! quos(a, b, c)))
#> -c(~a, ~b, ~c)

It is not necessary to wrap symbols in quosures if you're only referring to data frame columns, which would most likely be the case with select(). The only added value of the quosure is that it carries information about the context where it was created so that you can refer to local variables.

1 Like

I've been using quosured symbols to avoid all the quotation marks.

Is there a forth version where we delay evaluation of the symbols but don't bother capturing the context?
My first attempt crashes R.

# Bare expression
expr(-c(!!! expression(a, b, c))) # crashes R

ohh that's not good! I'll fix that, thanks.

expression() is a base function that creates objects of a very peculiar type that are no longer put to any real use except as the return value of base::parse(). It is a bit confusing but when we mention expression we pretty much never mean that particular type, we mean either symbols or calls.

To capture a list of raw expressions you can use the plural version of expr():

expr(-c(!!! exprs(a, b, c)))
#> -c(a, b, c)

It also supports unquoting:

exprs(
  !! 10 + 1,
  !!! letters[1:3]
)
#> [[1]]
#> 11
#>
#> [[2]]
#> [1] "a"
#>
#> [[3]]
#> [1] "b"
#>
#> [[4]]
#> [1] "c"
1 Like