'Cannot alloate vector of size 500kb'


If I have posted this in the wrong place, then please let me know so I can change it.

I am very new to RStudio, unfortunatley having to use it to manipulate data for my masters dissertation (yes, I am being thrown in the deep end a little bit). I do know some of the basics, and luckily a scrpit has been supplied by the person who compiled the dataset that I am using.

Unfortunately I have fallen at the first hurdle: loading the data. An error message: 'Error: cannot allocate vector of size 500 kb'. From looking at other forum posts, I know that this is a result of RAM limitations. However, the 500kb value has me stumpted. The dataset isn't that large, only 1gb.

I've tried simple troubleshooting measures, such as the 'gc ()' command, trying to get as much RAM free in my laptop by closing other programmes etc. My laptop spects are:

Ryzen 5 3500U
8gb RAM (2gb hardware reserved).

Any ideas on how I can get this dataset to load?

Thanks in advance :slight_smile:

What file type is the data stored in?
Even better, what code do you use to try to read it?

Code is:

dataall2=read.csv("DATA/C5922DATASET13022017.csv", header=T,na.strings=c("NA", "-","?","<null>"),
+                   stringsAsFactors=F,check.names=FALSE)

And, as seen in the code, it is a .csv file

That code didn't format correctly...
It looks like this in RStudio:

If I was in your position I would use the data.table packages fread function which I would expect to be more robust for reading very large csv files.

I encourage you to try it and see if you get a good result.

Thank you for your suggestion. Would this still work with the formatting of a scrpit that has already been written? The line I'm trying to run is one of several thousand, so any changes that would disrupt these wouldn't be practical.

you could try reading say the first 2000 records only of your csv , and compare the resulting datatypes and contents to that from running data.table fread (set up with good parameters).

dataall2=read.csv("DATA/C5922DATASET13022017.csv", header=T,na.strings=c("NA", "-","?","<null>"),
    stringsAsFactors=F,check.names=FALSE,nrows = 2000)

theres also a skip= param you could used, so if you were dead set on not deviating from a read.csv approach, you could maybe split the csv up into chunks of however many thousands of lines, read them in and stitch them together into one table after.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.