I'm trying to use purrr::in_parallel() for parallel bootstrapping where I repeatedly fit models and extract summary statistics. My workflow has two functions: (1) a main function that fits a model on bootstrapped data, and (2) a helper function that extracts statistics of interest from the fitted model.
Based on the documentation/vignette, I understand that I need to explicitly pass objects to parallel workers since each worker operates in its own isolated environment. However, even when I pass both my main function and helper function to in_parallel(), I get the error:
# FAILING EXAMPLE - Separate helper function
library(mirai)
library(purrr)
library(tibble)
# Create simple dataset
data <- tibble(
predictor = rnorm(100),
outcome = rbinom(100, 1, 0.5)
)
# Helper function
my_helper <- function(model) {
data.frame(
coefficient = coef(model)[2],
pvalue = summary(model)$coefficients[2, 4]
)
}
# Main function that calls helper
my_main <- function(data) {
# Bootstrap sample
n <- nrow(data)
boot_data <- data[sample(n, n, replace = TRUE), ]
# Fit model
model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
# Call helper function
my_helper(model)
}
# Start parallel workers
daemons(2)
results_fail <- map(
1:100,
in_parallel(
.f = function(x) my_main(data),
data = data,
my_main = my_main,
my_helper = my_helper # Helper passed but not found in worker environment
)
)
# Clean up
daemons(0)
Error in `map()`:
ℹ In index: 1.
Caused by error in `my_helper()`:
! could not find function "my_helper"
Run `rlang::last_trace()` to see where the error occurred.
I expect the helper function to be available in the worker environment because I have explicitly passed it to in_parallel().
My current workaround: I can get the code to run if I move the helper function logic directly into the main function (i.e., combine them into a single function), but I would prefer my code to be more modular.
# WORKING EXAMPLE - Combined function approach
library(mirai)
library(purrr)
library(tibble)
# Create simple dataset
data <- tibble(
predictor = rnorm(100),
outcome = rbinom(100, 1, 0.5)
)
# Single function with helper logic embedded
my_main_combined <- function(data) {
# Bootstrap sample
n <- nrow(data)
boot_data <- data[sample(n, n, replace = TRUE), ]
# Fit model
model <- glm(outcome ~ predictor, data = boot_data, family = binomial)
# Helper logic inline
data.frame(
coefficient = coef(model)[2],
pvalue = summary(model)$coefficients[2, 4]
)
}
# Start parallel workers
daemons(2)
# This works fine
results_work <- map(
1:100,
in_parallel(
.f = function(x) my_main_combined(data),
data = data,
my_main_combined = my_main_combined
)
)
print("Success!")
print(results_work[[1]])
# Clean up
daemons(0)
What am I missing here? What is the correct way to set up in_parallel() when my main function needs to call helper functions? Is this a potential bug that I should create a GitHub Issue for? ![]()
> sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22621)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C LC_TIME=English_Canada.utf8
time zone: America/Toronto
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tibble_3.3.0 purrr_1.1.0 mirai_2.4.1
loaded via a namespace (and not attached):
[1] compiler_4.5.1 magrittr_2.0.3 cli_3.6.5 tools_4.5.1 carrier_0.2.0 pillar_1.11.0
[7] glue_1.8.0 rstudioapi_0.17.1 vctrs_0.6.5 lifecycle_1.0.4 pkgconfig_2.0.3 rlang_1.1.6
[13] nanonext_1.6.2