Need help on scraping multiple speeches from a particular URL.
It's basically an archive of transcribed speeches, with subheading for the year, month, plus unique URLs for each speech. I've been trying to scrape speeches for a particular month, but I've been struggling as the urls for the speeches themselves follow no particular order to establish a vector that I can then successfully scrape and apply.
library(stringr)
library(rvest)
url_base <- "http://historico.presidencia.gov.co/discursos/discursos2010/enero/%d.html"
The original URL has /archivo.html, but I wanted to try out making it dynamic
###Here's the code I am trying to work with:
map_df(, function(i){
page <- read_html(sprintf(url_base, i))
data.frame(speech= html_text(html_nodes(page, "td.parrafo[align='left']")))
}) -> Uribe2ndTerm
####Here's the code used for an individual speech, as a reference.
example_url <- "http://historico.presidencia.gov.co/discursos/discursos2009/septiembre/religion_29092009.html"
one <- read_html(example_url)
two <- html_nodes(one, "td.parrafo[align='left']")
three <- html_text(two)
four <- str_replace_all(three, "(\r|\n\s*)", "")
Any help would be graciously appreciated.