I have a list of 800 urls. I want to scrape the element .breadcrumb from this pages. when i do a test with 50 pages everything goes well. When I do the full list of urls (800) I get the error: "Error in open.connection(x, "rb") : HTTP error 403."
This is my code:
# Read csv #
websites <- read.csv("websites.csv", sep = ";")
View(websites)
# Make list #
list <- as.list(websites$URL)
# Scrape all pages #
breadcrumbs <- list %>%
map(read_html) %>%
map(html_node, ".breadcrumb") %>%
map_chr(html_text)
Error in open.connection(x, "rb") : HTTP error 403.
How can i fix this?
UPDATE:
There are some drupal pages with no access that causes the 403 error How can i set this up in R to ignore this?