Create Pipeline of Function Calls from List

msaenger · March 25, 2018, 7:58am

I would like to create a pipeline of function calls from a list. The list contains the function names and arguments.

library(tidyverse)

# Example (hardcoded)
1:10 %>% cumsum %>% diff(lag = 2)

# [1]  5  7  9 11 13 15 17 19

# Functions names and arguments defined in a nested listed
opts = list(x = 1:10, chain = c(cumsum, diff), args = list(diff = list(lag = 2)))

How can I produce a pipeline of function calls based on the list?

Matt

fauxneticien · March 25, 2018, 9:09am

I'm not quite sure how to do it with your opts object as-is, especially since you can pass in unnamed functions like chain = c(cumsum, function(x, y) x + y), but if you are able to specify the arguments as a positional list (as opposed to a named list), you can use purrr's partial and map2 to partially apply those arguments to the respective functions, then use magrittr::freduce to recursively apply a list of functions to some data:

library(magrittr)
library(purrr)

opts <- 
    list(x = 1:10,
         chain = c(cumsum,
                   diff),
         args = list(list(),
                     list(lag = 2)
         )
    )

eval_pipeline <- function(opts) {
    map2(.x = opts$args,
         .y = opts$chain,
         .f = ~ ifelse(is_empty(.x), .y, do.call("partial", c(.y, .x)))
         ) %>%
        freduce(
            value = opts$x,
            function_list = .
        )
}

eval_pipeline(opts)

Looking forward to someone else's solution on doing it with your opts object as-is, though.

mara · March 25, 2018, 2:32pm

Tidy evaluation might come in handy here:
https://adv-r.hadley.nz/evaluation.html

JohnMount · March 25, 2018, 2:42pm

If the reason you are trying to convert from a list is to re-use the pipeline, you might try something like magrittr's "save pipeline as a function notation:

library("dplyr")
f <- . %>% cumsum %>% diff(lag = 2)
1:10 %>% f
## [1]  5  7  9 11 13 15 17 19

alistaire · March 25, 2018, 4:12pm

You can use purrr::reduce (or just Reduce) to assemble the pieces, and rlang to munge the ingredients. I made one change to opts, storing the functions as expressions instead of raw functions, as otherwise the name of lag isn't stored anywhere, so there is no way to know which parameters go with which function. You could use quosures instead of expressions, but since the data is determined by the pipeline structure, not any references, it is easier to use expressions so you can ignore environments.

Assembling the pipeline is not too bad; it's just reduceing the calls, with .init set, splicing each call into the resulting expression. Adding the parameters is a little harder, but the heavy lifting can be done with rlang::call_modify. The parameters have to be subset out of opts$args, which means altering the input expr into a string with which to subset. This can be done with expr_name(.y[[1]]), where the [[1]] is to drop the parentheses from the call. The parameters thus subset need to be unquote-spliced into call_modify so they are passed raw, not as a list.

The resulting expression of a pipeline can be evaluated with purrr::eval_tidy, or because it an ordinary expression, plain old eval.

library(purrr)
library(rlang)

opts = list(x = 1:10, 
            chain = list(expr(cumsum()), expr(diff())), 
            args = list(diff = list(lag = 2)))

reduce(opts$chain, ~expr(!!.x %>% !!.y), .init = opts$x)
#> 1:10 %>% cumsum() %>% diff()

chain <- reduce(opts$chain,
                ~expr(!!.x %>% !!call_modify(.y, !!!opts$args[[expr_name(.y[[1]])]])), 
                .init = opts$x)

chain
#> 1:10 %>% cumsum() %>% diff(lag = 2)

eval(chain)   # or eval_tidy(chain)
#> [1]  5  7  9 11 13 15 17 19

If you'd rather store the data as symbols instead of expressions of calls (i.e. without the parentheses), you can drop the call subsetting in the expr_name call, but will need to call call2 on the symbol to turn it into a call (i.e. add parentheses). call2 can be used to add parameters instead of call_modify, too:

opts = list(x = 1:10, 
            chain = list(expr(cumsum), expr(diff)), 
            args = list(diff = list(lag = 2)))

reduce(opts$chain, ~expr(!!.x %>% !!.y), .init = opts$x)
#> 1:10 %>% cumsum %>% diff

reduce(opts$chain, ~expr(!!.x %>% !!call2(.y)), .init = opts$x)
#> 1:10 %>% cumsum() %>% diff()

chain <- reduce(opts$chain,
                ~expr(!!.x %>% !!call2(.y, !!!opts$args[[expr_name(.y)]])), 
                .init = opts$x)

chain
#> 1:10 %>% cumsum() %>% diff(lag = 2)

eval(chain)   # or eval_tidy(chain)
#> [1]  5  7  9 11 13 15 17 19

call2 will accept a variety of inputs to specify the function, so the above will actually work fine if opts$chain is just a character vector of function names c("cumsum", "diff") without any modification to the code (though expr_name would be superfluous). If there were a way to figure out which args to get, it would work on the original data, too (though the intermediary code would look a bit uglier).

msaenger · April 2, 2018, 10:05am

Thanks everyone for the support. I slightly changed the structure where I store the modifiers:

library(purrr)
library(dplyr)

# Define the data set
set.seed(1)
x <- data.frame(a = rep(1:2, 5), x = 1:10, y = runif(10))

# List the modifiers (pipeline)
opts = list(
  list(fct = expr(filter), args = list(expr(a == 1))),
  list(fct = expr(mutate), args = list(cumsum = expr(cumsum(y)))),
  list(fct = expr(top_n), args = list(n = 2))
)
# Evaluate the pipeline 
eval(reduce(opts, ~ expr(!!.x %>% !!call2(.y$fct, !!!.y$args)), .init = x))

#>
  a x         y   cumsum
1 1 7 0.9446753 1.984719
2 1 9 0.6291140 2.613833

Could be used for instance in a shiny application where the user selects a couple of modifiers which are then applied to the data set and returned (as a plot).

Best regards, Matt

mara · April 2, 2018, 1:04pm

Since it looks like you're all set, would you mind choosing a solution (even if it's your own)? (see FAQ below for how) It makes it a bit easier to visually navigate the site and see which questions still need help.

Thanks
Mara