purrr::in_parallel(): Helper functions not found in worker environment

I'm trying to use purrr::in_parallel() for parallel bootstrapping where I repeatedly fit models and extract summary statistics. My workflow has two functions: (1) a main function that fits a model on bootstrapped data, and (2) a helper function that extracts statistics of interest from the fitted model.

Based on the documentation/vignette, I understand that I need to explicitly pass objects to parallel workers since each worker operates in its own isolated environment. However, even when I pass both my main function and helper function to in_parallel(), I get the error:

# FAILING EXAMPLE - Separate helper function
library(mirai)
library(purrr)
library(tibble)

# Create simple dataset
data <- tibble(
  predictor = rnorm(100),
  outcome = rbinom(100, 1, 0.5)
)

# Helper function
my_helper <- function(model) {
  data.frame(
    coefficient = coef(model)[2],
    pvalue = summary(model)$coefficients[2, 4]
  )
}

# Main function that calls helper
my_main <- function(data) {
  # Bootstrap sample
  n <- nrow(data)
  boot_data <- data[sample(n, n, replace = TRUE), ]
  # Fit model
  model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
  # Call helper function
  my_helper(model)
}

# Start parallel workers
daemons(2)

results_fail <- map(
  1:100,
  in_parallel(
    .f = function(x) my_main(data),
    data = data,
    my_main = my_main,
    my_helper = my_helper  # Helper passed but not found in worker environment
  )
)

# Clean up
daemons(0)
Error in `map()`:
ℹ In index: 1.
Caused by error in `my_helper()`:
! could not find function "my_helper"
Run `rlang::last_trace()` to see where the error occurred.

I expect the helper function to be available in the worker environment because I have explicitly passed it to in_parallel().

My current workaround: I can get the code to run if I move the helper function logic directly into the main function (i.e., combine them into a single function), but I would prefer my code to be more modular.

# WORKING EXAMPLE - Combined function approach
library(mirai)
library(purrr)
library(tibble)

# Create simple dataset
data <- tibble(
  predictor = rnorm(100),
  outcome = rbinom(100, 1, 0.5)
)

# Single function with helper logic embedded
my_main_combined <- function(data) {
  # Bootstrap sample
  n <- nrow(data)
  boot_data <- data[sample(n, n, replace = TRUE), ]
  # Fit model
  model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
  # Helper logic inline
  data.frame(
    coefficient = coef(model)[2],
    pvalue = summary(model)$coefficients[2, 4]
  )
}

# Start parallel workers
daemons(2)

# This works fine
results_work <- map(
  1:100,
  in_parallel(
    .f = function(x) my_main_combined(data),
    data = data,
    my_main_combined = my_main_combined
  )
)

print("Success!")
print(results_work[[1]])

# Clean up
daemons(0)

What am I missing here? What is the correct way to set up in_parallel() when my main function needs to call helper functions? Is this a potential bug that I should create a GitHub Issue for? :thinking:

> sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22621)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tibble_3.3.0 purrr_1.1.0  mirai_2.4.1 

loaded via a namespace (and not attached):
 [1] compiler_4.5.1    magrittr_2.0.3    cli_3.6.5         tools_4.5.1       carrier_0.2.0     pillar_1.11.0    
 [7] glue_1.8.0        rstudioapi_0.17.1 vctrs_0.6.5       lifecycle_1.0.4   pkgconfig_2.0.3   rlang_1.1.6      
[13] nanonext_1.6.2   

Helper functions can now be straightforwardly passed to the ... argument of in_parallel() if you update the carrier package to version 0.3.0 or greater. This will be enforced by the upcoming purrr 1.2.0 release.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.