I am trying to scrape some links from the body of each page in this website.
Though I am able to retrieve the links with this full_url for the first page, when I am trying to parse through the next few pages using the query argument page=, it does not work. I am not sure what is going wrong? Any help would be appreciated.
I am using the below code:
first_page<- read_html(full_url)
td_links <- html_nodes(first_page, "td a")
td_links
Step 3: Extract the href attributes
hrefs <- html_attr(td_links, "href")
unique(hrefs)
Welcome! It would likely work in you favor if you could provide some more details, e.g. your current iteration logic and code that generates or extracts URLs for consecutive pages to provide more context, also exact error messages instead of "does not work" . Fully reproducible example would be ideal.
One probable suspect would be your request rate, especially if you have not limited this in any way. Also your crafted request might not be quite valid or you may need to use the same session while you crawl ( rvest::session()
or other means to use provided cookies with your requests ).