I have a project that uses a dataset that has approximately 5,000,000 rows and 13 columns. Is RStudio cloud able to handle that sort of volume? Is there a desktop version that could be used instead of the cloud version? (Free verasion was not up to the task). What RStudio product would be best suited for this task.
Any suggestions would be appreciated.
The Cloud Free version of Posit Cloud that you are using is limited to 1 GB of RAM. The Cloud Basic plan has 8 GB and is $25 per month. Cloud Standard has 32 GB and is $75 per month.
To install on your computer, the open source version of RStudio Desktop is free and limited only by the resources in your machine.
On my small laptop I just created a 5,000,000 X 13 matrix of random numbers, converted it to a data.table and wrote the results to disc as a .csv file. It is 1.2 GB on disc . So the 5,000,000 X 13 dataset, itself, is not likely a problem.
Memory demands on analyses, etc., depends on what you are doing. With a large data set probably the best strategy is to store the data in a database and just pull in what data you need. R provides packages that will let you connect to most data bases.
suppressMessages(library(data.table))
xx <- matrix(rnorm(65000000), ncol = 13)
DT <- as.data.table(xx)
DT[, mean(V1)]
fwrite(DT, "big.csv")
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.