Working with large dataset

Good morning,
I'm working with a large data set. Is a table with cluster analysis data from an HTS sequencing. I have a problem with constant crushing of R studio, especially when I try to make plots and merging 2 data frame. Sometimes I have the crushing error (session aborted) also when I try to load the data in the console and I am basically unable to proceed with my analyzes.

There is a way to work with such data set using tidyverse and vegan package without having crush in R studio?

I'm using the latest version of R v.4.0.2 and R studio v.1.3.1093 in Ubuntu v.18.04.5

Thanks for the help.

Whats the size of your data ?
How much free memory do you have ?
R is typically memory bound though you there are packages that can let you page memory to disk. Would have to Google around to find them, I dojt remember the names right now.

The data size is 2.5 Gb and I have 3 Gb of free memory on my computer.

You might really struggle to be effective without more headroom in the memory department :frowning:

You can look here :https://cran.r-project.org/web/views/HighPerformanceComputing.html
in the Large memory and out-of-memory data section, maybe something like ff package would help you.
Also data.table might be a perferred way to manipulate the large data object as it supports modification in place ( typically objects are copied on mutation and tie up more memory).

1 Like

Ok @nirgrahamuk,
I will spend my time getting deeper into this topic.
Thank You so much for your help.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.