from do() to future_map_dfr() with tidyeval

Hi,

I started by creating a tidyeval function that I can use to do least-squares fitting and apply business rules. I wanted it to be generic, so I specify which columns are the data data inputs for the model. It returns a tibble with the parameters of the fitted model:

fitPL <- function(data, conc, signal) {
     conc <- enquo(conc)
     signal <- enquo(signal)
     #
     # calculations here using !!conc and !!signal
     # ... 
     # 
     return(ti.res)
}

This now works perfectly well if I call it using do(), given that Condition, Concentration and Result are columns of ti.data:

    ti.fits <- ti.data %>%
        group_by(Condition) %>%
        do(fitPL(., Concentration, Result))

However, the fitPL() is fairly slow! And the problem is embarrassingly parallelizable, so I would like to call it as, say:

    ti.fits <- ti.data %>%
        split(.$Condition) %>%
        future_map_dfr(.f = fitPL, conc = Concentration, signal = Result)

This of course gives an error because 'Concentration' does not exist to the future_map_dfr() call. But it also gives an error if I pass the column name as a character value, because then fitPL tries to do its calculations using the character value as its (only) input.

Do I need to wrap the column parameters some way? Or do I need to rewrite fitPL so that it also (?) accepts character inputs? Can I have it both ways (i.e. passing these parameters with or without quotes)?

Just for extra clarification future_map_dfr() is part of the furrr package.

2 Likes

I'm really keen to hear a tidyeval expert on this one, because the way I'd read enquo() being explained, it uses "black magic" to retrieve the expression originally passed by the user. I have to wonder how (if) that can still happen when enquo() is being called in a potentially entirely different R process from the original expression.

@Emmanuel, have you tested this with purrr::map_dfr() rather than do()? It might help to rule out any other syntactical problems :slight_smile:

What about something like this:

library(tidyverse)

fitPL <- function(data, conc, signal) {
  conc <- enquo(conc)
  signal <- enquo(signal)
  
  ti.res <- data %>% 
    select(!!conc, !!signal) %>% 
    mutate(new_col = !!conc + !!signal)
  
  return(ti.res)
}


mtcars %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(data = map(data, ~fitPL(.x, mpg, qsec))) %>% 
  unnest()

#> # A tibble: 32 x 4
#>      cyl   mpg  qsec new_col
#>    <dbl> <dbl> <dbl>   <dbl>
#>  1     6  21    16.5    37.5
#>  2     6  21    17.0    38.0
#>  3     6  21.4  19.4    40.8
#>  4     6  18.1  20.2    38.3
#>  5     6  19.2  18.3    37.5
#>  6     6  17.8  18.9    36.7
#>  7     6  19.7  15.5    35.2
#>  8     4  22.8  18.6    41.4
#>  9     4  24.4  20      44.4
#> 10     4  22.8  22.9    45.7
#> # ... with 22 more rows

Created on 2019-01-23 by the reprex package (v0.2.0).

Since future_map has all the functionality of map you should be able to just replace one with the other. This assumes that you are returning a tibble with fitPL

2 Likes

Yes, this works! The call becomes

    ti.fits <- ti.data %>%
        group_by(Experiment, Tag) %>%
        nest() %>% 
        mutate(
            data = furrr::future_map(data, ~fitPL(.x, Concentration, Result), .progress = TRUE)
        ) %>% 
        unnest()

BTW, I also managed to get my original syntax to work by making two changes: Using ensym() instead of enquo() inside fitPL() to encapsulate the parameters, and passing the parameters as character values. (I don't pretend to fully understand why, but I guess it has something to do with enquo() retaining a reference to the original environment, while ensym() does not.)

    ti.fits <- ti.data %>%
        split(.$Condition) %>%
        future_map_dfr(.f = fitPL, conc = "Concentration", signal = "Result" ,.progress = TRUE)

However, the answer of tbradley is the better solution. It has the advantage of retaining all grouping columns, while in my original code I could only split on one column.

Thank you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.