Webscraping connection timeout issues

BeleaguerdGrad · December 22, 2020, 2:10am

I'm conducting a webscraping project, but I've run into issues with the code. Getting this error while trying to run the lines of code below, specifically the sapply function: Timeout was reached: [historico.presidencia.gov.co] Connection timed out after 10000 milliseconds

I assume its because the scraping loop reaches the time limit before it can finish the process, and usually I can brute force the program to work, but it has been such a consistent issue that it has become quite frustrating. I have tried using the timeout function from httr, but I have had no success.

Any help is appreciated.

Note: because I am a new user, I cannot post my links exactly as they should. I have put spaces in between and removed http so that can post the URL without otherwise losing the meaningful information.

library(rvest)
library(tidyverse)
library(httr)





months <- tolower(c("Enero","Febrero","Marzo","Abril","Mayo","Junio","Julio","Agosto","Septiembre","Octubre","Noviembre","Diciembre"))
Uribe_index_2003_urls <- paste0(":// historico. presidencia. gov. 
   co/discursos/ discursos2003/ ",months,"/  ",months,"2003.htm")


search_month <- 7
url_one <- read_html(Uribe_index_2003_urls[search_month])
url_two <- html_nodes(url_one, "a.tituloscentro")
url_three <- html_attr(url_two, "href")
url_four <- url_three[which(!str_extract(url_three,"^.")==".")]
speech_url<- paste0("http://historico.presidencia.gov.co/discursos/discursos2003/",months[search_month],"/",url_four)
speech_url
sample_url <- speech_url[2]

get_speech <- function(sample_url)
{
  one <- read_html(sample_url)
  two <- html_nodes(one, "p.parrafos")
  three <- html_text(two)
  four <- str_replace_all(three, "(\r|\n\\s*)", "")
  five <- paste(four, collapse=" ")
  Sys.sleep(2)
  return(five)
}


#####Here is  where thing become problematic. Once it does complete, however, everything is good. It's the sapply which is the problem.

speeches_July_2003 <- sapply(speech_url,get_speech)
#####

speeches_2003_July.df <- data.frame("year"=2003, "month"=1, "text"=speeches_July_2003)

system · January 12, 2021, 2:10am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.