I want to access a cloud.google bucket that contains up to 5000 datasets. They are in order but some might be missing. Therefore, I want to create a script that runs through it and checks if the url exists. I created a simple loop:
for(i in 1:length(x)) {
print(paste0("loop is at index ", x[i]))
output$exists[i] <- url.exists(paste("url", x[i]),.header = FALSE)
}
The data is in the format "url_1" , "url_2" etc. so I am able to just have an array of increasing numbers and paste it to the url. However, after around 300 checks it gets stuck and in fact R kind of gets stuck. I wondered if it is something of a spam-protection but if I force quit R and restart, it immediately starts working again. I could imagine that another issue persists.
Any suggestions how I could fix it?
Thanks
Jonas
P.S. unfortunately I cannot share the link to the bucket.
Hi @jonas2,
Maybe the server is being overwhelmed with your requests or is actively blocking too many requests.
How about pausing for a few seconds after each set of, say 20, requests?
for(i in 1:length(x)) {
print(paste0("loop is at index ", x[i]))
output$exists[i] <- url.exists(paste("url", x[i]), .header = FALSE)
if(i %% 20 == 0) {cat("Pausing... \n"); Sys.sleep(5)}
}
Hi Davo
Thanks for your idea - it also doesn't work. It feels like the issue is more with RCurl. Maybe it does not properly close the connections (sometimes when I read the tables after a while it starts closing open connections, I don't know how that works under the hood though).