Hello,
I am trying to read multiple parquet files in parallel using future_map function.
The code is performed in R version 4.2.2 (2022-10-31 ucrt) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
I aim to parallel this parquet reading function:
parallel_reading <- function (parquetfile, dx9dx10){
data <- read_parquet(parquetfile, as_data_frame = F) %>% # reading the parquet file
filter(if_any(c(DX1, DX2, DX3, DX4), ~ (.x %in% dx9dx10))) %>% #dx9dx10 is a global vector variable
select(ENROLID, DX1, DX2, DX3, DX4, PDX, AGE, SEX, Service) %>%
compute()
return(data)
}
I applied future_map function to run parallel
plan(multisession)
temp_list <- future_map(1:length(parquet_files), \(y) parallel_reading(Inpatient_parquet_files[y], dx9dx10))
Here "parquet_files" contains vector of three parquet file names living in my working directory, such as 'xyz.parquet', 'abc.parquet', and 'efg.parquet'.
However, when I run this future_map function the temp_list is null and gives me an error
"Error: Invalid , external pointer to null". Note that it runs fine when it is done sequentially but gives an error when done multisessionally.
Thanks in advance for your help!