Programming with dots
Collecting dots
Dots only work within functions, as they're defined by arguments passed to a function. To see what they look like, you have to collect them, e.g.
collect_dots <- function(...){
list(...)
}
collect_dots('a')
#> [[1]]
#> [1] "a"
str(collect_dots(3:5, cos(pi), list('hi'))) # str prints lists more compactly
#> List of 3
#> $ : int [1:3] 3 4 5
#> $ : num -1
#> $ :List of 1
#> ..$ : chr "hi"
Calling list
on the dots collects them into a single object, evaluating them in the process (note that cos(pi)
is now -1
).
Accessing dots
An alternative to collection is the numeric accessors. ..1
refers to the first argument to the ...
parameter, ..2
to the second, etc. A ...elt()
function was recently added to R, where ...elt(2)
is equivalent to ..2
. A ...length()
function was also added, returning the number of arguments to ...
. For documentation, see ?dots
.
second <- function(...){
message('2nd of ', ...length())
..2
}
second(1:4, 'b', tan(0))
#> 2nd of 3
#> [1] "b"
What can be useful about this notation is that it only evaluates the argument referred to, not everything, like list
does. This matters, because sometimes you don't want a long-running argument to evaluate. For example the first of these takes 3 seconds because Sys.sleep(3)
gets evaluated, whereas the second is effectively instantaneous because it never gets called:
system.time(collect_dots(Sys.sleep(3), 'hlo'))
#> user system elapsed
#> 0.000 0.000 3.003
system.time(second(Sys.sleep(3), 'howdy'))
#> 2nd of 2
#> user system elapsed
#> 0.003 0.000 0.003
This is consistent with functions' treatment of parameters, which are only evaluated if they are referred to in the function. A function that doesn't use any parameters will never evaluate what you pass it, e.g.
one <- function(x) 1
one(stop("Error!"))
#> [1] 1
Collecting dots without evaluation
There are ways to collect dots without evaluating them, but this steps into operating on the language, which is a more advanced topic.
A quick example of collecting dots without evaluation (ignore if you like)
collect_dots_2 <- function(...) substitute({...})
str(collect_dots_2('hi', 2, tan(pi)))
#> language { "hi"; 2; tan(pi) }
Using alist
instead of braces is more practical for operating on the calls, but the above illustrates what's happening better. ?match.call
also collects but does not evaluate dots (as part of the whole call).
Passing dots to another function
Dots are not evaluated if they're passed directly to another function, either (though they're usually evaluated by that function). Because they're not collected, that allows them to be spliced into the parameters of the function they're passed to. For example, the following function is mean
but with na.rm = TRUE
as the default. Because everything is passed through ...
, I can still pass a trim
argument:
mean_without_NAs <- function(..., na.rm = TRUE){
mean(..., na.rm = na.rm)
}
mean_without_NAs(c(0, NA, 47, 94))
#> [1] 47
mean_without_NAs(c(1, 2, 3, 100000), trim = 0.25)
#> [1] 2.5
This splicing behavior shows that collecting dots is really a special case of passing them in which they're passed to a function that assembles them into an object like list
or c
. Thus collect_dots
above is effectively just an alias for list
.
Note that when calling mean_without_NAs
that we still have to collect the values we'd like the mean of with c
, as mean
takes the value of its x
parameter. We could make a version of mean
that accepts dots (like sum
) by collecting the dots in the function (here with c
instead of list
, as mean
takes a vector, not a list). To still access the other parameters, they now have to be added to the wrapper function explicitly, as the dots are now passed to c
for collection instead of on to mean
.
mean_of_dots <- function(..., trim = 0, na.rm = FALSE){
mean(c(...), trim = trim, na.rm = na.rm)
}
mean_of_dots(1, 5, 10, 47)
#> [1] 15.75
mean_of_dots(1, 3, NA, 5, na.rm = TRUE)
#> [1] 3
What pmap
does
What a data frame is
To understand what purrr::pmap
does when applied to a data frame, you have to think of the data frame as a list. In fact, a data frame is a list, with a few restrictions and a bit of fanciness like rownames. To see the underlying list, call unclass
on a data frame:
library(tidyverse)
some_data <- data_frame(
x = 1:2,
y = c('a', 'b')
)
str(unclass(some_data))
#> List of 2
#> $ x: int [1:2] 1 2
#> $ y: chr [1:2] "a" "b"
#> - attr(*, "row.names")= int [1:2] 1 2
You can call pmap
on a non-data frame list of this same structure, and you'll get the same result—pmap
doesn't care about the class, only the structure.
What pmap
gets passed
To see what goes into the function passed to pmap
, if we pass list
as that function, it will collect the arguments which will be spliced into whatever other function you pass pmap
, so each element of the resulting list is a set of parameters that will be called:
some_data %>%
pmap(list) %>%
str()
#> List of 2
#> $ :List of 2
#> ..$ x: int 1
#> ..$ y: chr "a"
#> $ :List of 2
#> ..$ x: int 2
#> ..$ y: chr "b"
(If you like, you can think of purrr::transpose(some_data)
is a shortcut for pmap(some_data, list)
.)
Calling pmap
on functions that take dots
To pass the data in some_data
through pmap
to a function that does more than list
, let's try paste
. Since paste
always returns a character vector, we'll use the pmap_chr
version, which will simplify the resulting list to a character vector for us:
some_data %>% pmap_chr(paste)
#> [1] "1 a" "2 b"
Because some_data
has two rows (each element in the list is length two), paste
is getting called twice and the whole call is equivalent to
c(paste(1, 'a'), paste(2, 'b'))
#> [1] "1 a" "2 b"
This doesn't do anything particularly useful in this case, but use-cases certainly exist.
Calling pmap
on functions with named parameters
Also note that list
and paste
both themselves accept dots into which the arguments are getting spliced. If the function you're mapping does not accept data through dots, the names matter, as the arguments are passed in with their names and thus picked up by the corresponding parameters like how in
do.call(mean, list(x = c(2, NA, 47), na.rm = TRUE))
#> [1] 24.5
TRUE
gets passed to na.rm
, not trim
, despite the fact that trim
is the second parameter, because the argument is named. Thus,
list(
x = list(1:5, c(1, NA)),
na.rm = c(TRUE, FALSE)
) %>%
pmap_dbl(mean)
#> [1] 3 NA
If the names of the data frame or list don't line up with the parameter names, you won't get what you want unless you rename in some fashion. This is what the part of the link the original post mentioned was about.