How can I increase RStudio memory to process large number of samples

Hi,
I process over a thousand genomic samples in R and I need to increase R memory to complete the workflow successfully. RStudio is installed on Ubuntu, and my machine has 6 cores. Any suggestions on how to increase RStudio memory?
Also, does paid RStudio have extra memory over the free one?

Thanks,
Eman

Hi Eman,
I'm not sure I fully understand your question,

  1. Having 6 cores is nice, but cores give you processing power and not memory.
  2. It is not clear from the question if you lack physical memory to store the data file before calculations starts or if during calcs the RAM runs out.
  3. RStudio such as any compiler will you whatever memory it can find and will use paging RAM to hard drive when needed.
  4. What coding methods have you taken to increase CPU use over Memory use?

Possible solutions not related to RStudio:
1st thing to always try is free up memory by closing unnecessary apps (RAM) or clean up the hard drive.

If these won't help possible more drastic measures are:

Install another or just upgrade your SSD drive.
Install more RAM, or better RAM (faster with more memory) - this might require new motherboard.

Or alternatively:
Use external services like SPARK or COLLABORATORY to upload the data and run the code on server - these services have free to use limited edition which could be a solution (Collaboratory has monthly quote).
I have never tried but POSIT does have paid solutions such as POSIT CONNECT and POSIT CLOUD

Hi,

Thank you very much for your thorough response. I'm currently utilizing the DADA2 pipeline to process 16S PacBio sequences. The denoise step poses a significant challenge when employing the pooling option, as the code halts and the process is aborted. With the pooling option, the software attempts to load a large number of sequences, approximately 750GB in size, all at once. It appears that the memory capacity of R is unable to handle this load. Hence, I've resorted to using the pseudo option, although the results differ significantly.
The memory of my device is:
total used free shared buff/cache available
Mem: 62Gi 13Gi 7.9Gi 431Mi 41Gi 48Gi
Swap: 2.0Gi 936Mi 1.1Gi

I haven't employed any coding methods to prioritize CPU usage over memory usage. A few months back, I attempted to execute this R script on a more robust server, only to encounter failure at the same step (during the pooling code). This leads me to believe that the issue is indeed related to the memory capacity of the R software itself, irrespective of the specifications of the server where R is installed. While I'm not entirely certain, this is my understanding based on previous attempts.
So, I'm unsure which option would effectively resolve this issue: upgrading my computer, investing in a private cloud service, or utilizing an external service as you suggested. However, ultimately, I'll still be running the R script in RStudio with its limitations.

Thank you!

Hi sorry for late response, I didn't get the notification.

I'm a bit confused as you claim you need 750 GB for sequences all at once, but your computer does not have even 500 GB on it.

Let me be clear, in older versions of RStudio there was a memory limit function which you could adjust but in newer versions there isn't, it will use any memory it can find.

As I wrote before, you have 3 options:

  1. Install more memory - including a new 1TB SSD that could be used by windows as "ram like" memory.
  2. Upload your data to cloud services that allow running code like Google Collaboratory or Posit or Spark
  3. Make adjustments to your code to do more calculations but save less in memory (dynamic programming).

memory.limit() is no longer a function in R and will return \infty as value with warning.

To increase RStudio memory for processing large genomic datasets, adjust R's memory limit using memory.limit() in R or modify RStudio's memory allocation settings. The paid version of RStudio doesn't provide additional memory.