readxl is really slow reading from network files

readxl::read_xlsx is really slow reading from my network drives. This has started within the past few months. Code that I have been running for years is almost un-usable now.

Do you think this is due to the architecture of the network drives? I am wondering if this is something that I can alleviate or if I will need to modify my approach to copy files local or use a different package.

Here is a reprex
localpath <- "path to a folder on my computer"
networkpath <- "path to a shared network drive"

#create a test tibble with 26 rows and 2 columns
df <- tibble::tibble(column_a = letters, column_b = 1:26)

time test #1

#write it to local drive
writexl::write_xlsx(df, path = stringr::str_c(localpath, "testfile.xlsx"))

#read in from local drive.
microbenchmark::microbenchmark(
readxl::read_xlsx(stringr::str_c(localpath, "testfile.xlsx"))
)
#this takes an average of 11 miliseconds

time test #2

#write it to a network drive
writexl::write_xlsx(df, path = stringr::str_c(networkpath, "testfile.xlsx"))

#read it in from a network drive
microbenchmark::microbenchmark(
readxl::read_xlsx(stringr::str_c(networkpath, "testfile.xlsx"))
)
#this takes an average of 679 miliseconds which is 61 times slower than above.

this section does the same tests using csv files to verify the issue is specific to readxl::read_xlsx

readr::write_csv(df, stringr::str_c(localpath, "testfile.csv"))

#this takes 177 miliseconds
microbenchmark::microbenchmark(
readr::read_csv(stringr::str_c(localpath, "testfile.csv"))
)

readr::write_csv(df, stringr::str_c(networkpath, "testfile.csv"))

#this takes 336 miliseconds which is 2 times slower
microbenchmark::microbenchmark(
readr::read_csv(stringr::str_c(networkpath, "testfile.csv"))
)

Here is my sessionInfo()

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] readxl_1.4.2 writexl_1.4.2

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3 cellranger_1.1.0

Not sure about the issue, but I've found that openxlsx::read.xlsx() is much faster from network drives.
Read, Write and Edit xlsx Files • openxlsx (ycphs.github.io) Read, Write and Edit xlsx Files • openxlsx (ycphs.github.io)

Thank you for the reply! I am not sure if I will use openxlsx or write wrapper code to copy local to temp directory.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.