Web-scraping multiple speeches from

BeleaguerdGrad · December 13, 2020, 8:07am

Need help on scraping multiple speeches from a particular URL.
It's basically an archive of transcribed speeches, with subheading for the year, month, plus unique URLs for each speech. I've been trying to scrape speeches for a particular month, but I've been struggling as the urls for the speeches themselves follow no particular order to establish a vector that I can then successfully scrape and apply.

library(stringr)
library(rvest)

url_base <- "http://historico.presidencia.gov.co/discursos/discursos2010/enero/%d.html"

The original URL has /archivo.html, but I wanted to try out making it dynamic

###Here's the code I am trying to work with:

map_df(, function(i){

page <- read_html(sprintf(url_base, i))

data.frame(speech= html_text(html_nodes(page, "td.parrafo[align='left']")))

}) -> Uribe2ndTerm

####Here's the code used for an individual speech, as a reference.
example_url <- "http://historico.presidencia.gov.co/discursos/discursos2009/septiembre/religion_29092009.html"
one <- read_html(example_url)
two <- html_nodes(one, "td.parrafo[align='left']")
three <- html_text(two)
four <- str_replace_all(three, "(\r|\n\s*)", "")

Any help would be graciously appreciated.

system · January 3, 2021, 8:07am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.