Error in reading files in parallel using future_map()

Hello,

I am trying to read multiple parquet files in parallel using future_map function.

The code is performed in R version 4.2.2 (2022-10-31 ucrt) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

I aim to parallel this parquet reading function:

parallel_reading <- function (parquetfile, dx9dx10){
  data <- read_parquet(parquetfile, as_data_frame = F) %>%  # reading the parquet file 
    filter(if_any(c(DX1, DX2, DX3, DX4), ~ (.x %in% dx9dx10))) %>%  #dx9dx10 is a global vector variable
       select(ENROLID, DX1, DX2, DX3, DX4, PDX, AGE, SEX, Service) %>% 
    compute()

  return(data)
}

I applied future_map function to run parallel

plan(multisession)

temp_list <- future_map(1:length(parquet_files), \(y) parallel_reading(Inpatient_parquet_files[y], dx9dx10))

Here "parquet_files" contains vector of three parquet file names living in my working directory, such as 'xyz.parquet', 'abc.parquet', and 'efg.parquet'.

However, when I run this future_map function the temp_list is null and gives me an error
"Error: Invalid , external pointer to null". Note that it runs fine when it is done sequentially but gives an error when done multisessionally.

Thanks in advance for your help!

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.