I am observing significant performance differences and memory consumption between running a script live as compared to submitting it as a local job to the community edition of RStudio Server.
Background:
- Running Ubuntu 18.04 LTS with RStudio Server (Version 1.2.1335). Server has 64GB of RAM and 16 cores.
- I have a long running optimization script using the
library(GA)
(genetic algorithms) library which makes use of thelibrary(doParallel)
to provide parallelization across the cores.
Observations:
- When I run the script via the console or user interface, the process generally takes 1.25 hours and consumes 20% of the 64GB of RAM for the duration. Average processor utilization is 80% of the 16 cores for the duration.
- When I submit the script as a job I am observing different behavior. The memory usage spikes to nearly 100% and then encroaches on swap. Following that, processor utilization stays around 1-2% and then every 5 minutes jumps to about 50% for a few seconds. I let the script run for about 2 hours before stopping.
Other Facts:
- Both scenarios above have no other user initiated processes running on the server or during the job submission.
- The script is self-contained, sourcing other required R scripts as well as RDS files as needed from the working directory.
Research to Date:
- I've reviewed the RStudio Server Pro manual and found the rserver.conf and rsession.conf options. They are both currently blank.
- I have a theory the inconsistent behavior may either have to do with:
- The R environment and 'copy job results values', however have not found a resolution yet
- The GA library loading in the job created environment and recognizing the available server resources available (i.e. # of cores). This would not explain the high memory usage though.
Any help or references to overlooked research areas is greatly appreciated. Thank you!