Error Message or Print Statement

I'm trying to webscrape data from Google Scholar on a large list of people. I made a similar post here.

Someone mentioned the error message may not be an error, but instead a print statement. I've tried modifying my code to recognize the 429 error print statement, but the results are still the same.

scholar_ids <- character(nrow(info))
scholar_ids[] <- NA  # Initialize with NA values

last_successful_iteration <- 0  # Initialize last successful iteration

i <- 1  # Start at the first row

while (i <= nrow(info)) {
  id <- NULL
  
  output <- capture.output({
    tryCatch({
      id <- get_scholar_id(last_name = info$last_name[i],
                           first_name = info$first_name[i],
                           affiliation = "School Name")
      last_successful_iteration <- i  # Update last successful iteration
    }, error = function(err) {
      cat("Error message:", err$message, "\n")  # Print the error message
      stop(err)  # Stop on other errors
    }, warning = function(warn) { cat("Warning: ", warn$message, "\n") })
  }, type = "message")
  
  # Check if the output contains "429"
  if (any(grepl("429", output))) {
    cat("Timeout. Pausing for 15 minutes.\n")
    Sys.sleep(900)  # Pause for 15 minutes (900 seconds)
    next  # Retry the same iteration
  }
  
  if (!is.null(id) && length(id) > 0) {
    scholar_ids[i] <- id  # Store the ID if successfully retrieved
  }
  
  i <- i + 1  # Increment the loop variable to move to the next iteration

  # Random sleep to avoid hitting rate limits
  sleep_time <- runif(n = 1, min = 10, max = 12)
  Sys.sleep(sleep_time)
}

If this is relevant, the 429 error code message is output in red in the console, which is why I thought it was a real error. I also tried the advice from the person in my previous post, but it didn't work. I would appreciate any help on this issue.

What are the results? Can you show the output?

I think you cant capture.output and inspect the results if you are re-throwing stops that you catch.

demo of your problem :

output1 <- capture.output({stop("istoppedit")})
output1
# Error: object 'output1' not found

output2 <- capture.output(tryCatch(stop("istoppedit"),
                                   error = function(err){
                                     #just print the error but problematically  re stop 
                                     cat("Error message:", err$message, "\n")  
                                     stop("arbitrary re stop")
                                   }))
output2
# Error: object 'output2' not found 

output3 <- capture.output(tryCatch(stop("istoppedit"),
                                   error = function(err){
                                     #just print the error but dont re stop 
                                     cat("Error message:", err$message, "\n")  # Print the error message
                                   }))
output3
# "Error message: istoppedit "

in your code as it was presented here; I would expect any(grepl("429", output)) to fail due to a lack of an output object. Exception being that if there is an output object because output was made at least once without encountering any errors; then the error will never be in output and the if(any( test will always be false) as you are always testing the last successful attempt

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.