Is tidy evaluation compatible with default parameters?

Andrea · August 29, 2018, 5:23pm

After learning the pattern enquo-!!, I got hooked on tidy evaluation: I still don't grok it, but I have to say, it's soooo much better than lazyeval!! As it always happens when I started learning something new, I found a stumbling block: default parameters. When the symbol which is quoted/unquoted is a function parameter with a default value, I'm not able to get tidy evaluation to work. I guess it has something to do with the concept of lazy evaluation, but the truth is that I don't get what's going on.

Here's an example: don't worry about the function which gets and wrangle data (but definitely feel free to look at the data if you're a football fan, they're fun!). The only functions you need to care about is traceplots_by_factor:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(ggplot2)
library(rio)

# function to get & wrangle data: you don't have to care about this
get_and_wrangle_nfl_data <- function(){
  # get the data
  download.file("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018-08-28/nfl_2010-2017.csv",
                "nfl_2010-2017.csv")
  nfl <- import("nfl_2010-2017.csv")
  
  # wrangle the data
  nfl$V1 <- NULL
  nfl <- nfl %>%
    mutate(time = ymd(game_year, truncated = 2L) + months(8) + (game_week - 1) * weeks(1)) %>%
    filter(name == name[1]) %>%
    gather(key = variable, value = value, -name, -time, -position)
}

# plotting function with defaults
traceplots_by_factor <- function(dataframe_tall, x_var, y_var, var,
                                               factor_var = NULL, factor_values = NULL){
  x_var  <- enquo(x_var)
  y_var  <- enquo(y_var)
  var    <- enquo(var)
  if (!is.null(factor_var)) {
    factor_var     <- enquo(factor_var)
    dataframe_tall <- filter(dataframe_tall, !! factor_var %in% factor_values)
  }
  
  p <- ggplot(dataframe_tall, aes(x = !! x_var, y = !! y_var)) +
    geom_point(color = "blue") +
    facet_wrap(vars(!! var), scales = "free_y") +
    guides(col = guide_legend(ncol = 1))
  p
}

# get & wrangle data
nfl <- get_and_wrangle_nfl_data()
positions <- unique(nfl$position)

# this plot works
traceplots_by_factor(nfl, time, value, variable)


# this doesn't works
traceplots_by_factor(nfl, time, value, variable, factor_var = position, factor_values = positions[1])
#> Error in traceplots_by_factor(nfl, time, value, variable, factor_var = position, : oggetto "position" non trovato

Created on 2018-08-29 by the reprex package (v0.2.0).

In other words, if I let factor_var have his default value NULL, the plotting function works runs, but if I try to assign position to factor_var, the plotting function doesn't work anymore. Why?

mishabalyasin · August 29, 2018, 6:59pm

One of the most important things in debugging is to localize. For example, in your case the error is happening before anything interesting happens.
Another thing that you've seen couple of times here, but I'll repeat once again, is the virtue of reproducible example. You are almost there with that since your question already uses reprex package, but ideally you would think a little bit about how your question can be put into reprex using built-in datasets (iris, mtcars and so on).
That being said, here is how you can fix your function:

library(tidyverse)

example <- function(filter_var = NULL, filter_values = NULL){
  filter_var <- enquo(filter_var)
  if (!rlang::quo_is_null(filter_var)) {
    iris <- filter(iris, !!filter_var %in% filter_values)
  }
  as_tibble(iris)
}

example()
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 140 more rows

filter_values <- unique(iris$Species)
example(Species, filter_values[1])
#> # A tibble: 50 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 40 more rows

Created on 2018-08-29 by the reprex package (v0.2.0).

As you can see, the problem was to do with the fact that your variable is evaluated immediately when you call is.null. When it is, in fact, NULL then everything is fine and function merrily continues on its way. But when it is not null, is.null is trying to find an object Species (this is what your error says) and fails, since there is no Species/position object in scope.

Hope that helps.

Andrea · August 30, 2018, 10:17am

Thanks @mishabalyasin! That solved my problem

One of the most important things in debugging is to localize.

Good point - I tried to simplify the function wrt to my "production" code, but I didn't simplify it enough: in other words, It wasn't a Minimal Example.

Another thing that you've seen couple of times here, but I'll repeat once again, is the virtue of reproducible example. You are almost there with that since your question already uses reprex package, but ideally you would think a little bit about how your question can be put into reprex using built-in datasets ( iris , mtcars and so on).

Well, my example is certainly not minimal, but it is fully reproducible - it downloads a small dataset from a trustworthy place the R4DS community site. I admit I used that dataset mainly because I liked it and I thought someone else could find it fun (it certainly isn't the actual dataset I'm working on, for my project), but point taken - next time I'll stick to built-in datasets.

mishabalyasin · August 30, 2018, 11:37am

There are many people (and I'm one of them ) who don't want to download any data from remote sources without a good reason to do so.
It is also often times a useful exercise to understand whether your problem can be replicated with built-in datasets since you can come across the solution inadvertently while doing so. It happened to me more than once, that's for sure