Hello,
My new hobby is using R, and I'm enjoying it. I don't have a CE or CS background, so some of my questions are hard to articulate in web searches. I scraped around 200 URLs that I put into a data frame. I'd like to feed the data frame URLs into rvest one at a time in the style of my code below. My goal is to have a big data frame with all the text that I'm trying to scrape from multiple pages.
pages <- read_html(df_links) %>%
html_elements("The Elements of Interest") %>%
html_text()
If I feed 1 page into the code above, with the appropriate elements, I get what I want. I get the text from the website that I can later do word counts on, graph with ggplot2, and make a word cloud.
However, I'm struggling to understand how I can use read_html to go through each successive URL in the data frame I made. After a few hours of searching on stack exchange, I'm guessing I need to do some type of for loop, or learnlapply
. I'm just looking for some other input before I dedicate time to learning either for loops, or the lapply
package.
Thanks for any input.