Run a Query of 1 million lines of a CSV file times out

I tried running a query of a dataset in a csv file of 1 million rows and the query times out. Will going to a professional version with 16 GB of RAM help?


Mike Williams

Can you share an example of your code and some sample data? Unless you have an extremely wide dataset, 1 million records should fit comfortably on pretty much every laptop.

I'm running it in the cloud environment, but I tried running it on my desktop as well. It works in the cloud environment with a 1500 line dataset.

Run this search by the algorithm for CNN Data

search_res_CNN <- merge(search_dtCNN[, id:=1L], search_for[, id:=1L], by="id", allow.cartesian=TRUE)[,
match:=corpusCNN %like% word, by=.(corpusCNN, word, value)][
match==TRUE, .(words=paste(sort(word), collapse=", "), "CNN Insider Word Score"=sum(value)), by=corpusCNN]

search_res_CNN <- merge(search_dtCNN[, -"id"], search_res_CNN, on="corpusCNN", all.x=TRUE)

It would be really helpful to have a sample of your data too, so that we can try and reproduce the issue that you are having. Without knowing anything about your dataset, the most obvious place to check would be whether you really want allow.cartesian to be TRUE.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.