Help running parallel copies of R code that calls a Fortran DLL

woodward · April 9, 2019, 9:29pm

Hi Rsters! I have a legacy simulation model written in Fortran 95 that I am am calibrating using the excellent BayesianTools package. I was running it from R using the .Fortran() function, but was having memory problems for long runs/chains because this only works under 32-bit R with 32-bit DLLs.

Recently I solved this by compiling the Fortran inside an R package following the excellent instructions at
https://www.avrahamadler.com/2018/12/09/the-need-for-speed-part-1-building-an-r-package-with-fortran/

This allowed the calibration to run under 64-bit R, solving the memory issues, but didn't really make it faster.

I am exploring options for speeding it up via parallelisation. I am running 3 independent calibration chains (each has 3 internal chains in DREAMzs) that should be easy to run on separate cores (I have an Intel Core i7 with 8 logical processors and 24Gb RAM running Windows 10).

I have it running using the parallel package and parLapply(), which uses more CPU % than the serial version, but elapsed time is the same. Is there some bottleneck caused by calling a Fortran DLL? Even though it is compiled in an R package? Is there only a single instance of the DLL or can I clone it to all the nodes?

Thanks!

PS: I'll see if I can come up with a meaningful reprex... hmm, it works for the SimpFort package vignette from the link above (see below), so maybe Fortran is not the problem...

library(SimpFort) # package that uses Fortran DLL
r <- 3 # number of times to run
n <- 100000000 # size of calculation
x <- rep(100000,n)

print("Run serial")
#> [1] "Run serial"
print(system.time(
  print(unlist(lapply(X=1:r, FUN=function(X){
    LLC_f(x, 50000, 90000)
  })))
))
#> [1] 1e+12 1e+12 1e+12
#>    user  system elapsed 
#>    0.35    0.00    0.36

library(parallel)
cl <- makeCluster(r)
clusterEvalQ(cl, library(SimpFort))
#> [[1]]
#> [1] "SimpFort"  "stats"     "graphics"  "grDevices" "utils"     "datasets" 
#> [7] "methods"   "base"     
#> 
#> [[2]]
#> [1] "SimpFort"  "stats"     "graphics"  "grDevices" "utils"     "datasets" 
#> [7] "methods"   "base"     
#> 
#> [[3]]
#> [1] "SimpFort"  "stats"     "graphics"  "grDevices" "utils"     "datasets" 
#> [7] "methods"   "base"
clusterExport(cl, "x")

print("Run parallel")
#> [1] "Run parallel"
print(system.time(
  print(unlist(parLapply(cl, X=1:r, fun=function(X){
    LLC_f(x, 50000, 90000)
  })))
))
#> [1] 1e+12 1e+12 1e+12
#>    user  system elapsed 
#>    0.00    0.00    0.14

^{Created on 2019-04-10 by the reprex package (v0.2.1)}

woodward · April 9, 2019, 11:09pm

Solved it! Turns out I was timing the code wrongly, and triple counting the time spend in the parallel version!

andresrcs · April 10, 2019, 2:59am

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · April 17, 2019, 2:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.