Running in parallel (multiple cores vs. multiple source windows)

McDreimiller · April 20, 2022, 2:22pm

I manage a faculty computer lab and I have a couple of 8-core iMacs that have R Studio installed on them. A faculty member is using one of them to run some code that takes more than a week to run. They mentioned that there are multiple sections in their code that are basically the same but each runs an increasing number of simulations such that the first section might take a hour and the last might run for a week.

I asked the faculty member if their R Studio would automatically take advantage of the multiple cores on the iMac. They showed me that you can open a second source window and run code in there at the same time the first source window is running code.

I'm not an R user but I wondered if there wasn't a better way to run the code so that each section ran in its own core. So I did some Google searching and found this thread about running code in parallel in R.

So, my question is does opening a second source window automatically run in a separate core from the first source window? If so, can we have more than two source windows so that we can run each section of the code in their own source window each on their own core? Or do we have to modify the code as discussed in the above link so that each section runs on separate cores?

michaelmayer · April 21, 2022, 8:54am

R is a single-threaded language. Hence by default it will only use one core for a given process.

The only benefit you would get from using 1 out of 8 cores on a server is the increased clock speed of your CPU due to TurboBoost which will make the code run faster but very likely not as fast as if you were to use all 8 cores at once.

So running 8 R processes on a 8-core machine each in a different source window would be a good use of the 8-core iMac, but maybe not the most efficient from a user perspective.

Using more than one core in R can be achieved programmatically in multiple ways. Some R packages provide support for parallel compute backends. If your code spends most of its time in function calls to such R packages you can possibly speed up your code significantly without changing the code too much or not at all.

Regarding the code of your faculty member: You mention "multiple sections of code that is the same" and "increasing number of simulations". Simulations are typically run multiple times with no interdependencies and hence parallelisation is trivial. If the code already uses

for loops to iterate over the simulations, those can be converted into a foreach loop registered against a doMC parallel backend.
* apply functions, those can be converted into par * Apply functions registered against a parallel backend.
futureverse, it is fairly straightforwards to switch to parallel computing.
functional programming, you can parallelize the code by using the furrr equivalent of the purrr function.

There is also other bespoke tools/R packages for parallelization beyond a single server like batchtools and clustermq. Those tools are especially useful if the code eventually also needs to be run on larger compute infrastructures such as HPC clusters.

With the 8 core iMac the faculty member should expect a speed up from parallelisation of up to factor 8, but he/she also should be aware of Amdahls law that basically tells you that the maximum speed up of a code when parallelised critically depends on the fraction of the code that cannot be parallelised.

There is a lot more options that can be considered but: Before any parallelisation effort the most important thing to remember is to optimise the code even when run single-threaded. Increasing vectorisation of the code (replacing simple for loops, ... ) can speed up the code much more than any parallelisation method mentioned above: After all R is an interpreter language, and the less we rely on the interpreter but shove things down closer to the binary execution, the better.

General guidance on performance optimisation in R can be found at 24 Improving performance | Advanced R.

Both performance optimisation and parallelisation of codes can be time consuming but especially if the code is going to be reused (i.e. run more than once) it is very worthwhile to pursue.

McDreimiller · April 22, 2022, 1:42pm

Excellent! Thanks Michael!

system · May 13, 2022, 1:42pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.