url.exists() loop gets stuck

Hello everyone,

I want to access a cloud.google bucket that contains up to 5000 datasets. They are in order but some might be missing. Therefore, I want to create a script that runs through it and checks if the url exists. I created a simple loop:

for(i in 1:length(x)) {
  print(paste0("loop is at index ", x[i]))
  output$exists[i] <- url.exists(paste("url", x[i]),.header = FALSE)
}

The data is in the format "url_1" , "url_2" etc. so I am able to just have an array of increasing numbers and paste it to the url. However, after around 300 checks it gets stuck and in fact R kind of gets stuck. I wondered if it is something of a spam-protection but if I force quit R and restart, it immediately starts working again. I could imagine that another issue persists.

Any suggestions how I could fix it?

Thanks
Jonas

P.S. unfortunately I cannot share the link to the bucket.

Hi @jonas2,
Maybe the server is being overwhelmed with your requests or is actively blocking too many requests.
How about pausing for a few seconds after each set of, say 20, requests?

for(i in 1:length(x)) {
  print(paste0("loop is at index ", x[i]))
  output$exists[i] <- url.exists(paste("url", x[i]), .header = FALSE)
  if(i %% 20 == 0) {cat("Pausing... \n"); Sys.sleep(5)}
}

Hi Davo
Thanks for your idea - it also doesn't work. It feels like the issue is more with RCurl. Maybe it does not properly close the connections (sometimes when I read the tables after a while it starts closing open connections, I don't know how that works under the hood though).

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.