I am trying to build a function to scrape certain info of Amazon users like reviewer ranking, number of helpful votes, etc. However, when applying the function, it returns an empty table without any information.
The code is below:
#### Loading packages ####
library(tidyverse)
library(rvest)
#### Function to scrape user information ####
scrape_user <- function(user_id){
url_user <- paste0("https://www.amazon.com/gp/profile/amzn1.account.",user_id)
doc <- read_html(url_user) # Assign results to `doc`
# Reviewer Ranking
doc %>%
html_nodes("[class='a-size-base']") %>%
html_text() -> reviewer_ranking
# Helpful Votes
doc %>%
html_nodes("[class='a-size-large a-color-base']") %>%
html_text() -> helpful_votes
# Number of Reviews
doc %>%
html_nodes("[class='a-size-large a-color-base']") %>%
html_text() -> n_reviews
# Return a tibble
tibble(user_id = user_id,
reviewer_ranking,
helpful_votes,
n_reviews) %>% return()
}
#### List of IDs of users ####
id_list <- c("AE2RRRB42BQPO7HTSHCHKTBW442Q", "AH3CPFJRT5PTJEZKE2WZK5GLBQYQ", "AFOACDZPXXUUXUXCG4IGOAXJDS2A")
#### Scraping the information ####
users_info <- data.frame(matrix(ncol=4,nrow=0, dimnames=list(NULL, c("user_id", "reviewer_ranking", "helpful_votes", "n_reviews"))))
for (j in id_list) {
message("Getting information for user with ID ",j)
Sys.sleep(5)
users_info = rbind(users_info, scrape_user(j))}
I think the problem could be that, while going to any user profile on Amazon, it takes some time to fully load the page and until that an almost empty page is shown. Is it possible to make rvest wait until the page fully loaded to scrape data? Or do you think the problem is something else?