RStudio Server problem with CPU and RAM usage. How to improve it?

I was try to read a 5.0GB file (uploaded on the cloud in home dir) using RStudio Server. My data frame has (450.000 lines x 920 columns). Even using a dedicated google server (Linux, 625GB RAN, 96 GPU), it seems that I´m having some memory or a CPU issue, because it takes a long time to process. Running this same scrip on my personal computer (more than 10 times worse), seems to take the same processing time.
Apparently I'm using less than 5% of google cloud processing capacity.
How can I make RStudio Server use more CPU and memory to increase to increase performance (improve time consumption)?

Data1<- read.table("ArrayFiltrado3.txt",header = TRUE, sep="\t",stringsAsFactors=FALSE)
aggr(Data1, prop=FALSE, numbers=TRUE)

Any help would be appreciated.

I think you are confusing RStudio Cloud with RStudio Server, the former is hosted by Rstudio on their servers, not in your Google Cloud instance so the resources available in your server are not relevant for RStudio Cloud service.

Hi @Marcel! Welcome!

Following @andresrcs’s point above, to make sure your question gets seen by the people most likely to be able to help, can you confirm whether you’re working in RStudio Cloud, or in RStudio Server?

Hy @jcblum. Thanks for the answer. You're right, the correct one would be R Server.

Hy @andresrcs. Thanks for the answer. You're right, the correct one would be R Server.

I don't think that reading a single file with read.table could be parallelized, so having more cores or a GPU is going to have no effect on this task, try with something faster than read.table like fread

1 Like

You might take a look at vroom, which is currently even faster than data.table::fread() in several circumstances. Its benchmarking vignette can give you an idea of how it (and other packages) might perform on your data:

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.