including a progress bar in a drake plan step

Hey all! I'd like advice on how to include a progress bar in the longer tasks of a drake plan.

I've made up an example from a common pattern at work where I've taken to writing functions that take a long time (because I run them many, many times over) in such a way that they take a progress bar object (pb below) as an argument, like the one below.

library(drake)
library(tidyverse)

calc_mean <- function(df, var, pb){
  var <- enquo(var)
  pb$pause(0.2)$tick()$print()
  df %>% 
    sample_n(nrow(df), replace = T) %>% 
    summarise(m = mean(!!var))
}

Normally, I'd create a progress object by assigning it ahead of time. I tend to reuse the same variable since their ticks get "used up" by previous calls.

n1 <- 10 
p <- progress_estimated( n1 )
map_df(1:n1, ~calc_mean(df = mtcars, var = mpg, pb = p))

n2 <- 20
p <- progress_estimated( n2 )
map_df(1:n2, ~calc_mean(df = rock, var = area, pb = p))

How would I do this in a drake plan? The way I'm doing it I have to come up with a new name for the progress object each time. Anyone have a better idea?

plan <- drake_plan(
  n1 = 10, 
  p1 = progress_estimated( n1 ),
  means_1 = map_df(1:n1, ~calc_mean(df = mtcars, var = mpg, pb = p1)),
  n2 = 20, 
  p2 = progress_estimated( n2 ),
  means_2 = map_df(1:n2, ~calc_mean(df = rock, var = area, pb = p2)),
)

config <- drake_config(plan)
make(plan)

Note you can't just pass the progress object, it won't keep track of progress properly

n <- 10
map_df(1:n, ~calc_mean(mtcars, mpg, progress_estimated(n)))
1 Like

I recommend writing an outer function around map_df(). That function can create the progress bar and pass it to calc_mean(). See calc_means() below.

library(drake)
library(tidyverse)

calc_means <- function(df, var, n) {
  pb <- progress_estimated(n)
  var <- enquo(var)
  map_df(seq_len(n), ~calc_mean(df = df, var = var, pb = pb))
}

calc_mean <- function(df, var, pb){
  pb$pause(0.2)$tick()$print()
  df %>%
    sample_n(nrow(df), replace = T) %>%
    summarise(m = mean(!!var))
}

plan <- drake_plan(
  means_mtcars = calc_means(df = mtcars, var = mpg, n = 10),
  means_rock = calc_means(df = rock, var = area, n = 20)
)

make(plan)
#> target means_mtcars
#> |================================|100% ~0 s remaining     target means_rock
#> |================================|100% ~0 s remaining  

I like that you questioned your original plan with progress bar targets. Targets should have long runtimes and/or have meaningful return values. Progress bars are too quick, too trivial, and too brittle to make good targets.

The more you write your own functions, the easier drake is to use. The webinar at https://ropensci.org/commcalls/2019-09-24 discusses this more.

2 Likes

Ah that's perfect! Thank you.

Yes moving to drake has been a long time coming, but I noticed that the more I get familiar with functions the better it goes.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.