Should tidyeval be abandoned?

pavopax · October 25, 2017, 5:41am

I'm being provocative on purpose, but I have a point, all in the spirit of the "tenth man rule" .

Long-time dplyr user/advocate here.

I'm struggling with understanding quosures, quasiquotation, !!!

I've attempted to finish reading the programming vignette multiple times. Simply put, the design seems much less intuitive than everything else in dplyr.

I'm curious: does anyone really like the new style?

With experience, do people begin liking it?

If, to the community, the design seems contrary to the spirit of dplyr and tidyverse, can we put a brake on this and reconsider our options?

Note: I am far from a hater

ftobin · October 25, 2017, 5:53am

In all the code my team writes, we never create code that deals with passing barewords around, but rather strings. The evaluation contexts of barewords in various R functions is so diverse depending on the function accepting them that, coming from a more standard evaluation languages (e.g., every other), that it makes no sense to promote their use. We either use seplyr, dplyr's pronouns (.data), or base-R functions that utlize strings for dealing with dataframes.

epi_user · October 25, 2017, 10:50am

I find tidyeval much easier to read and write than I did lazyeval.
Doing something like

lazyeval::interp(~var, var = as.name(column))

was pretty horrendous.
I'm much happier with (e.g.)
var <- enquo(column)
then
summarise(n = sum(!!var)

yutannihilation · October 25, 2017, 11:45am

As a user, I like tidyeval. It's much easier than lazyeval. I can't even remember how I used lazyeval...

But, I agree with you in the point that tidyeval seems much different than everything else. To some people, it looks like abominable SAS macro.. Though, considering the complex nature of environment and evaluation, it's reasonable to employ the very unique design, it's hard to explain the concept of tidyeval to other people (I'm very glad if there are another version of vignette that contains less conceptual terms...).

So, I tend to recommend using scoped functions, which is sufficient for most of the case if you are not going to write a dplyr-like package.

tjmahr · October 25, 2017, 1:15pm

I have to echo this point. It is much easier than the earlier way of doing things. I also like the splicing operators because they save you a lot of busywork with manipulating arguments.

The biggest problem I think is that the framework is hard to learn, because we don't have many learning resources yet (be they vignettes, tutorials, or packages we can snoop on for ideas) and because the concepts are difficult to talk about. It is a challenge to get up to speed with tidyeval. But the design is an upgrade, and the documentation can always be improved.

One of my stumbling blocks with the tidyeval framework is that I was too quick to jump into quoting/unquoting.

When we're writing packages and if we want to use to dplyr functions, we have to use rlang in order to quote the names so that CRAN package checks do not wrongly warn about undefined global variables. For a while, I was annoyed because I thought that I had to quote-unquote any bare name used in dplyr function, using code like

s_time <- rlang::sym("time")
mutate(df, time2 = round(!! s_time, 2))

But it turns out that we now have a .data pronoun, which let's us skip that step.

mutate(df, time2 = round(.data$time, 2))

Any I've started using that in my package code.

My point being that it can be frustrating and tedious because it's not clear---yet---what the best practices are for using tidyeval, but that's a manageable problem as the community writes more tutorials, cookbooks and cheatsheets.

nakamichi · October 25, 2017, 3:49pm

I use both R and SAS in everyday work, and I think tidyeval is much easier to understand than SAS macro.

SAS macro seems just a string substitution at first, but it has many hard-to-understand quoting rules and interactions with base SAS language. As a result, users of SAS have to remember different syntax of two distinct languages.

In contrast, tidyeval is based on ordinary semantics of R and we do not need to learn another new language, although it uses a lot of non-standard evaluation techniques and we have to learn some new notations such as !! and !!!.

Non-standard evaluation is, however, built-in useful functionality of R, and it is a little difficult to learn even if we did not have a tidyeval framework. So I think authors of tidyeval try to achieve a good balance between usability and incomprehensible magic, and they have done a fairly good job so far.

rkahne · October 25, 2017, 6:12pm

I wonder if the approach to tidyeval depends on how much standard evaluation one needs. For instance, I do kind of miss the filter_/mutate_ syntax, because it was super easy. However, I used it very rarely and in very straightforward cases. I wasn't a heavy user of lazyeval, mostly because it seemed so very daunting to approach. However! I have been able to get the tidyeval syntax to do what I want.

Anyways, I definitely don't think tidyeval should be abandoned, but I also do miss the old/easy way to do standard evaluation.

nwerth · October 25, 2017, 7:19pm

Every so often, I give tidyeval a shot when working on projects with coworkers who stick to the tidyverse. And every time, I gave up and rewrote the function to use standard evaluation with character vectors. Most of the time, I started writing a function to work on columns inside a dataset because it was the easiest way to wrap a function around existing code. In those cases, the better function worked on vectors.

But non-standard evaluation is required for a lot of stuff: dbplyr, data.table, ggplot, etc. While it isn't the best tool for most of us, it's probably good for that kind of stuff. If they use custom tools instead of R's built-in NSE, I'll have to assume there's some benefit.

It's like the openssl package: I can't imagine ever needing to call its functions directly, but httr uses it, so I hope it's not abandoned.

yutannihilation · October 26, 2017, 1:37am

Thanks nakamichi for the nice comparison! (Sorry for mentioning SAS which I don't know well...)

Agreed. Non-standard evaluation is a very common concept and has never been this easy. I want to congratulate the authors!

One thing, there are some kinds of people who don't like a "new notation" because they feel it is

For example, whether we should use %>% or not is (was?) always a topic of debate. I don't blame them, but, when I explain someone about some notation, I always a bit afraid that the notation itself can be a cognitive burden for them. (Maybe this is just because I poorly understand non-standard evaluation.)

dylanjm · October 26, 2017, 3:01am

I'm honestly still trying to figure out what exactly tidyeval is. I don't think I've ran into an instance where I've needed it. Anybody found a real simple way to think about what exactly it is?

pavopax · October 26, 2017, 6:06am

OP here. Thanks all for the valuable comments.

I think what I’m getting at is that tidyeval, as people say above, seems much less intuitive and clear compared to everything else in dplyr

So I’m wondering, do others feel that in terms of syntax, tidyeval is the best it can be, at least for the foreseeable future?

Or can/should we perhaps label it “beta” and try to figure out some better syntax?

martin.R · October 26, 2017, 8:24am

I would like to expand @dylanjm's question:
What is the difference between standard evaluation, non-standard evaluation, lazyeval and tidyeval?

Is there a concise explanation with relation to R without any programming theory?

I think I understand elements of them but I would appreciate a proper explanation and/or example.

I do share what seems to be the sentiment from some in this thread that dplyr went from something very simple to perplexing when the dplyr programming guide was added.

mara · October 26, 2017, 12:59pm

It doesn't cover everything, but @Edwin did a great write-up on tidyeval and base NSE:

I also stashed a bunch of tidyeval resources I found helpful in this post (I think there are 12 writeups or-so in there now— however many, way more than pictured ):

nutterb · October 26, 2017, 1:04pm

Not having spoken with the authors about their motivations, I can only describe what my impressions have been with respect to those motivations.

The primary advantage and motivation of many of the tidyverse tools is that the need to quote variable names is removed through the use of NSE. This produces what many consider "cleaner" looking code. Compare

select(data, "col1", "col2", "col3", "col4")
select(data, col1, col2, col3, col4)

One of the side benefits of using NSE is you can define convenient behaviors like select(data, col1:col4). In terms of interactive use. This was far enough. The problem is that these concepts don't play well on the programming side.

Having tried ages ago to try and do some NSE, I gave up after a few weeks because it became quickly apparent to me that I didn't have the time to implement NSE versions of my functions and then subsequently maintain them. I needed solutions that worked, worked now, and would be easy to maintain. So I stuck with standard evaluation just to take one level of difficulty off of my plate. With tidyeval, I might eventually go back to trying NSE.

The other thing that tidyeval seems to be targeting is the elimination of no visible binding for global variable messages produce by R CMD CHECK. See the comments to this answer from Stack Overflow (https://stackoverflow.com/a/12429344/1017276). Eliminating these messages should make better code if we can do it without using utils::globalVariables (unless I'm misunderstanding something). Personally, I think this is the biggest perk to tidyeval. The aesthetics are nice, but code that behaves the way we want without having to use back doors to pass CRAN ranks much higher on my list.

martin.R · October 26, 2017, 1:06pm

@mara, thank you so much for coming to the rescue to another poster!

I remember reading the first link previously, but your resource links are a treasure trove.

mara · October 26, 2017, 1:07pm

Thanks! I'm trying to figure out a better way of doing them so that new material doesn't get buried in old blog posts (like this one!) So, any ideas or improvements are always welcome!

jennybryan · October 26, 2017, 6:40pm

True, but the typical user of dplyr should never need to know that tidyeval even exists. By definition, if you're programming around dplyr, i.e. writing functions that call dplyr functions, you're not the typical user.

I think these comments are spot on. I too am actively working at learning tidyeval (and rlang more generally) and thus feel the same pain as everyone else here. It's acknowledged that there's still a long way to go re: documentation and learning paths.

I think there are 2 types of people who need to learn more about tidyeval:

Wants to program around dplyr functions (or tidyr functions or any tidyverse pkg that uses NSE and has been switched over)
Wants their own package to have a user interface that feels like dplyr, tidyr, etc.

I am finding it much more rewarding to learn and more predictable than the methods we previously used to program around dplyr, i.e. the now-deprecated "underscore" verbs and some of the other examples above. I'm also finding tidyeval study useful as a gateway to delving into rlang more deeply. Which, in turn, is a very rewarding package for computing on the language in general.

Yeah, the "underscore" verbs were probably easier to get started with quickly. But I think they were sort of a limited, one-off solution. Whereas tidyeval is part of a bigger and more coherent project. What you learn will be applicable across many packages and provides an entry point to a very powerful set of tools for computing on the language (rlang).

hadley · October 26, 2017, 6:57pm

I am 100% confident that tidy evaluation is the correct theoretical underpinning for non-standard evaluation (NSE). NSE isn't great as a term because it is defined by what it is not, and is hence not specific: there are very many different ways of doing NSE; there's one way of doing tidy evaluation.

I think we're ~80% of the way to the right tools. I don't think the existing tools (e.g. functions in rlang) are perfect, but they seem to do a decent job of solving most problems that I come across. And the combination of purrr and quasiquotation is a thing of beauty.

I think we're maybe 40% of the way to the right approach to teaching tidy eval. This is something we're still working on - Lionel and I have been giving talks to try out different ways of motivating and teaching tidy eval, and I think we have a better approach than is currently used in the (e.g.) dplyr documentation. I'm also working on the 2nd edition of advanced R which is going to have 4-5 chapters on tidy eval (and related ideas). The material is still very rough, but you can have a look starting at at https://adv-r.hadley.nz/expressions (the pictures, at least, might be helpful)

martin.R · October 26, 2017, 6:58pm

Thanks. This is a helpful summary for my understanding of where tidyeval fits in:

rensa · October 27, 2017, 11:12am

Oh man, spooky timing—I came on tonight with the intention of writing some (hopefully) gentle criticism of the tidyeval docs myself.

I found this SO question tonight after hitting a similar problem myself. It's on renaming a column to the value of a string. The existing answers either refer to the 'SE dplyr' verbs or to alternate methods, like using base R or data.table.

I've looked a tthe programming vignette a few times in search of an answer to this—after all, tidyeval is supposed to be a simpler solution to problems like this—and found it difficult to understand. Eventually, after staring at it for an hour, I worked out what seems to be a solution. But it's not a particularly satisfying solution, because I don't know how it works.

By that I don't mean that I don't understand the internal mechanics—they're not the problem. I don't know how magrittr pipes work under the hood, but I do understand that syntactic equivalence (or sugar) they provide, and that understanding allowed me to start building a mental model into the rest of the tidyverse when I started playing around with readr and lubridate.

In this case I don't understand why I'm doing what I'm doing. I vaguely understand that !! "unquotes" its operand, evaluating it in place, and I have a vaue feeling that I should be using this to substitute the value of a variable for the variable itself, but I've no idea why I also need the := operator to make it work. Because I don't understand the rules at play, I can't build any sort of mental model—I can't generalise this knowledge to start learning other parts of tidyeval.

Unfortunately, I can't really help with improving this situation until I actually do understand tidyeval better myself. But it's a relief to know that this is seen as a problem, because it does currently feel a bit like the sort of 'learn these massive concepts before you can do something that feels like it ought to be simple' attitude that the tidyverse appears to rail against