I'm trying to webscrape data from Google Scholar on a large list of people. I made a similar post here.
Someone mentioned the error message may not be an error, but instead a print statement. I've tried modifying my code to recognize the 429 error print statement, but the results are still the same.
scholar_ids <- character(nrow(info))
scholar_ids[] <- NA # Initialize with NA values
last_successful_iteration <- 0 # Initialize last successful iteration
i <- 1 # Start at the first row
while (i <= nrow(info)) {
id <- NULL
output <- capture.output({
tryCatch({
id <- get_scholar_id(last_name = info$last_name[i],
first_name = info$first_name[i],
affiliation = "School Name")
last_successful_iteration <- i # Update last successful iteration
}, error = function(err) {
cat("Error message:", err$message, "\n") # Print the error message
stop(err) # Stop on other errors
}, warning = function(warn) { cat("Warning: ", warn$message, "\n") })
}, type = "message")
# Check if the output contains "429"
if (any(grepl("429", output))) {
cat("Timeout. Pausing for 15 minutes.\n")
Sys.sleep(900) # Pause for 15 minutes (900 seconds)
next # Retry the same iteration
}
if (!is.null(id) && length(id) > 0) {
scholar_ids[i] <- id # Store the ID if successfully retrieved
}
i <- i + 1 # Increment the loop variable to move to the next iteration
# Random sleep to avoid hitting rate limits
sleep_time <- runif(n = 1, min = 10, max = 12)
Sys.sleep(sleep_time)
}
If this is relevant, the 429 error code message is output in red in the console, which is why I thought it was a real error. I also tried the advice from the person in my previous post, but it didn't work. I would appreciate any help on this issue.