Big rds file, disconnect

Santos22903 · October 7, 2024, 5:18pm

Hello all,

My app is trying to upload an rds file - playing with one that is hundreds of MB, but some of them are over a GB - and then read it (readRDS) and the app gets disconnected from the server. I commented out almost everything else, so I am pretty sure the rds file upload/reading is where this happens. I increased read and startup timeouts to maximum values (60 and 300), but this did not change anything. Do you have any recommendations how I can get this running?
Maybe it matters, I am using Firefox. Oh, and locally it works fine
Thank you in advance

AlexisW · October 7, 2024, 10:52pm

Is it on shinyapps.io or your own server? It's possible that you're running out of RAM.

Santos22903 · October 8, 2024, 12:18am

It is on shinyapps.io. To incorporate the larger file size, I added: options(shiny.maxRequestSize=3*1024^3). I did not see anything on shinyapps RAM limitation, but did not expect them to be so "low". Should probably search for this.

AlexisW · October 8, 2024, 3:23am

In that case out of memory is a pretty likely cause. Go to the app's Settings, in the "General" tab you have an "instance size" option. For free accounts, the biggest you can have is "Large (1 GB)", with a "Basic" account or higher you can give up to 8 GB.

If your RDS is over a GB on disk, it's probably as much or even more in RAM (RDS should be a bit compressed). So what you're seeing is likely the instance running out of memory when trying to read all that data. In addition, 1 GB is quite a bit of data for an app, it will take a while to load when you open the app and many operations will be slow.

So, is there a way to make the dataset smaller (e.g. precomputing some stuff, only loading aggregates etc)? If not, could you consider using Parquet files or a DuckDB database? That way the whole dataset stays on disk and you can load only what you need when you need it.

Santos22903 · October 8, 2024, 5:41pm

Thank you so much. This seems to be the culprit indeed. I found an old rds file just below 30MB (useless for real running, but as a proof of concept) and indeed it went through, including uncommenting and using some of the other stuff.
Thank you for the suggestions, I will try to remove the RNA assay and see whether it would help (the files have a number of samples merged and this is the problem). I will also check out Parquet files and DuckDB.
Thanks again!

AlexisW · October 8, 2024, 6:19pm

Is it a scRNA-Seq dataset and the rds contains the full Seurat object? If so I might give you some inspiration: we had this app (code, shinyapps) that was loading the whole object and was quite slow and not memory-efficient; by removing the UMAP and saving different parts of the Seurat object as individual files that are loaded only when needed, I managed to make it quite faster and more efficient (code, shinyapps). Not saying you need to do the same thing, but that can give you ideas.

I have this other app (code, shinyapps) where I could not precompute much and have to use a big, dense, table; in that case I managed to get something decently efficient using DuckDB.

Santos22903 · October 8, 2024, 11:54pm

Yes, you got it right. Thank you so much. Had a brief look, very interesting. I will not be able to remove UMAPs - I need feature plots there, but there is a lot to play with. Thanks again!

system · January 6, 2025, 11:54pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.