Maximize RStudio memory for DADA2 workflow

Eman · September 29, 2023, 8:55pm

Hi,

I have approximately 1000 16S sequenced samples generated from the PacBio platform, and I am using DADA2 workflow to infer ASVs. The script is halted at dada function step since I use the pooling option

dd <- dada(filts, err=err, pool=TRUE, BAND_SIZE=32, OMEGA_A=1e-10, DETECT_SINGLETONS=FALSE, multithread=TRUE)

I tried to use different servers with higher memory space, but it did not work. So, I realized that the issue from the limitation of RStudio memory. I tested 2 options to maximize the memory as follows:

1- use doSNOW package

library (dosnow)

# number of cores to be used
cl <- makeCluster(5)
# make cluster of cores available
registerDoSNOW(cl)

2- using unix package
install.packages("unix")

library(unix)

rlimit_all()

But none of these worked and the script is still halted at the dada step.
I am using RStudio installed on Ubuntu 20.04-operated machine.
The size of the samples to be processed is 50GB, however, my trial was on a subset of samples of size 30GB

Your help is appreciated!

technocrat · September 29, 2023, 9:34pm

Are you getting an error message to the effect C: cannot allocate vector greater than 8388608?

Eman · September 30, 2023, 6:01pm

No, but the R session is aborted at the dada step. I briefly discussed this issue with the DADA2 developer and the recommendation was to use a supercomputer which means it is a memory space limitation. So, even after using a computer of higher specs I still have the same problem. So, I searched it online and realized that it is a software memory limitation.

system · November 11, 2023, 6:02pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.