Hello! I am trying to scrape all the posts in a forum website while referring to earlier examples found in this community!
However, I get error messages in the last bit, saying:
- Error in
mutate()
: In argument:messages = map(thread_links, scrape_messages)
.
Caused by error inmap()
: In index: 1. - Caused by error:
! './viewforum.php?f=84&sid=616e59608b95e1467d15352e8a3ffe77' does not exist in current working directory.
Could someone please enlighten me as to what went wrong? Thank you much!!
#install packages
library(rvest)
library(dplyr)
library(stringr)
library(purrr)
# Scrape thread titles, thread links, authors and number of views
h <- read_html("https://forum.singaporeexpats.com/viewforum.php?f=13&sid=597c6ea1f18d07ad8a8a7e304a78e00b")
threads <- h %>%
html_nodes("#page-body .list-inner a") %>%
html_text()
thread_links <- h %>%
html_nodes("#page-body .list-inner a") %>%
html_attr(name = "href")
# Custom function to scrape messages in each thread
scrape_messages <- function(thread_link){
read_html(thread_link) %>%
html_nodes(css = ".content") %>%
html_text() %>%
str_squish
}
# Create master dataset (and scrape messages in each thread in process)
master_data <-
tibble(threads, thread_links) %>%
mutate(messages = map(thread_links, scrape_messages)) %>%
select(threads, messages, thread_links)