Run a Query of 1 million lines of a CSV file times out

flexman71 · February 19, 2022, 10:13pm

I tried running a query of a dataset in a csv file of 1 million rows and the query times out. Will going to a professional version with 16 GB of RAM help?

Thanks,

Mike Williams

dvetsch75 · February 20, 2022, 2:26am

Can you share an example of your code and some sample data? Unless you have an extremely wide dataset, 1 million records should fit comfortably on pretty much every laptop.

flexman71 · February 20, 2022, 6:15pm

I'm running it in the cloud environment, but I tried running it on my desktop as well. It works in the cloud environment with a 1500 line dataset.

Run this search by the algorithm for CNN Data

search_res_CNN <- merge(search_dtCNN[, id:=1L], search_for[, id:=1L], by="id", allow.cartesian=TRUE)[,
match:=corpusCNN %like% word, by=.(corpusCNN, word, value)][
match==TRUE, .(words=paste(sort(word), collapse=", "), "CNN Insider Word Score"=sum(value)), by=corpusCNN]

search_res_CNN <- merge(search_dtCNN[, -"id"], search_res_CNN, on="corpusCNN", all.x=TRUE)

dvetsch75 · February 21, 2022, 2:35pm

It would be really helpful to have a sample of your data too, so that we can try and reproduce the issue that you are having. Without knowing anything about your dataset, the most obvious place to check would be whether you really want allow.cartesian to be TRUE.

system · March 14, 2022, 2:36pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.