I am new here, so apologies if I make noob mistakes.
I am trying to create databases after webscrapping multiple of HTML files. The Html page name is same except for the number which changes.
URL is something like this: (CTRI) where, # = any number.
Following is my code, which works perfectly fine for single URL separately. But I want to scale up the code to read multiple pages through loop. This is giving me error. I tried configuring my proxy details and still the error stays.
set_config(use_proxy(url="http://proxy.*******.ac.in", port=****))
ids = c(1:10)
counter = 0
for (i in seq_along(ids)) {
myurl = paste("http://ctri.nic.in/Clinicaltrials/pmaindet2.php?trialid=", i)
myurl = gsub(" ","",myurl)
if(http_error(myurl)) next
download.file(myurl, destfile = "mypage.html", quiet = TRUE)
pointer = read_html(myurl)
webpage = html_nodes(pointer, "tr")
webpage = html_table(webpage)
webpage = webpage[[1]]
webpage = webpage[-1,]
commonTable = NULL
commonTable$CTRI_Number = as.character(webpage %>%
filter(X1 %like% "CTRI Number") %>%
select(X2) %>%
str_extract_all("\\D+\\/\\d+\\/\\d+\\/\\d+"))
commonTable$reg_type = as.character(webpage %>% filter(X1 %like% "CTRI Number") %>% select(X2))
commonTable$Registered_on = as.character(webpage %>%
filter(X1 %like% "CTRI Number") %>%
select(X2) %>%
str_extract_all("\\d{2}\\/\\d{2}\\/\\d{4}"))
reg_details = data.frame(CTRI_Number = commonTable$CTRI_Number,
Registered_on = commonTable$Registered_on,
reg_type = commonTable$reg_type)
dbWriteTable(mydb, "reg_details", reg_details, append = TRUE)
write.table(reg_details, "reg_details.csv", sep = ",",row.names = FALSE, col.names = !file.exists("reg_details.csv"), append = T)
}
I'm getting this error.
` Error in open.connection(x, "rb") : Timeout was reached: [ctri.nic.in] Connection timed out after 10000 milliseconds`
Please help. Thankyou.