purrr::in_parallel(): Helper functions not found in worker environment

kebabpizza · August 19, 2025, 3:19pm

I'm trying to use purrr::in_parallel() for parallel bootstrapping where I repeatedly fit models and extract summary statistics. My workflow has two functions: (1) a main function that fits a model on bootstrapped data, and (2) a helper function that extracts statistics of interest from the fitted model.

Based on the documentation/vignette, I understand that I need to explicitly pass objects to parallel workers since each worker operates in its own isolated environment. However, even when I pass both my main function and helper function to in_parallel(), I get the error:

# FAILING EXAMPLE - Separate helper function
library(mirai)
library(purrr)
library(tibble)

# Create simple dataset
data <- tibble(
  predictor = rnorm(100),
  outcome = rbinom(100, 1, 0.5)
)

# Helper function
my_helper <- function(model) {
  data.frame(
    coefficient = coef(model)[2],
    pvalue = summary(model)$coefficients[2, 4]
  )
}

# Main function that calls helper
my_main <- function(data) {
  # Bootstrap sample
  n <- nrow(data)
  boot_data <- data[sample(n, n, replace = TRUE), ]
  # Fit model
  model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
  # Call helper function
  my_helper(model)
}

# Start parallel workers
daemons(2)

results_fail <- map(
  1:100,
  in_parallel(
    .f = function(x) my_main(data),
    data = data,
    my_main = my_main,
    my_helper = my_helper  # Helper passed but not found in worker environment
  )
)

# Clean up
daemons(0)

Error in `map()`:
ℹ In index: 1.
Caused by error in `my_helper()`:
! could not find function "my_helper"
Run `rlang::last_trace()` to see where the error occurred.

I expect the helper function to be available in the worker environment because I have explicitly passed it to in_parallel().

My current workaround: I can get the code to run if I move the helper function logic directly into the main function (i.e., combine them into a single function), but I would prefer my code to be more modular.

# WORKING EXAMPLE - Combined function approach
library(mirai)
library(purrr)
library(tibble)

# Create simple dataset
data <- tibble(
  predictor = rnorm(100),
  outcome = rbinom(100, 1, 0.5)
)

# Single function with helper logic embedded
my_main_combined <- function(data) {
  # Bootstrap sample
  n <- nrow(data)
  boot_data <- data[sample(n, n, replace = TRUE), ]
  # Fit model
  model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
  # Helper logic inline
  data.frame(
    coefficient = coef(model)[2],
    pvalue = summary(model)$coefficients[2, 4]
  )
}

# Start parallel workers
daemons(2)

# This works fine
results_work <- map(
  1:100,
  in_parallel(
    .f = function(x) my_main_combined(data),
    data = data,
    my_main_combined = my_main_combined
  )
)

print("Success!")
print(results_work[[1]])

# Clean up
daemons(0)

What am I missing here? What is the correct way to set up in_parallel() when my main function needs to call helper functions? Is this a potential bug that I should create a GitHub Issue for?

> sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22621)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tibble_3.3.0 purrr_1.1.0  mirai_2.4.1 

loaded via a namespace (and not attached):
 [1] compiler_4.5.1    magrittr_2.0.3    cli_3.6.5         tools_4.5.1       carrier_0.2.0     pillar_1.11.0    
 [7] glue_1.8.0        rstudioapi_0.17.1 vctrs_0.6.5       lifecycle_1.0.4   pkgconfig_2.0.3   rlang_1.1.6      
[13] nanonext_1.6.2

shikokuchuo · October 30, 2025, 10:14pm

Helper functions can now be straightforwardly passed to the ... argument of in_parallel() if you update the carrier package to version 0.3.0 or greater. This will be enforced by the upcoming purrr 1.2.0 release.

system · January 28, 2026, 10:15pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.