When using the separate()
function from tidyr
with colleagues who were new to the tidyverse (and R), I tried to explain why its arguments are provided the way the way and became curious about when non-standard evaluation should be used (in functions) and why.
With tidyr::separate()
, for example, the column to be separated (the argument col
) is provided without quotations, whereas the columns the column to be separated into are provided in a character vector:
library(tidyr)
library(dplyr, warn.conflicts=F)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df
#> x
#> 1 <NA>
#> 2 a.b
#> 3 a.d
#> 4 b.c
df %>% separate(x, c("A", "B"))
#> A B
#> 1 <NA> <NA>
#> 2 a b
#> 3 a d
#> 4 b c
I don't think this is idiosyncratic only to separate()
, though maybe it is and there is a unique reason why.
I thought one reason may be that the column to be separated exists in the data frame, whereas the columns that are to be separated into new columns do not exist (yet), and so that may be why the new column names are provided in a vector. However, for other functions, like select()
and mutate()
in dplyr
, the new names for the new variables / columns are provided without quotations, i.e. dplyr::mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width)
.
I ask in part out of curiosity and also because I would like to be consistent with use of non-standard evaluation by others and its use in tidyverse packages. I also ask because while there are good discussions and resources around the why of non-standard evaluation (via tidyeval
) and the how, I am less familiar with tips on the when.
Thank you for your pointers or feedback.