How to skip empty table while scraping with rvest::html_table()

Hi all, I'm trying to scrape the tables in this page - https://www.thegreyhoundrecorder.com.au/results/forbury-park/63442

I've got the page's contents into a variable page from which I extract the required tables using the following code

page %>%
    rvest::html_nodes(xpath = "//table[@class='raceResultsTable table-striped']") %>%
    rvest::html_table()

Because that page has no data for Race 2, I;m getting the error - Error in matrix(NA_character_, nrow = n, ncol = maxp) : invalid 'ncol' value (too large or NA)

Is there any way for me to specify html_table to skip that?

It works for other pages on the site like https://www.thegreyhoundrecorder.com.au/results/geelong/63461

Any help would be greatly appreciated. Thanks!

Hey @kaushiklakshman,

I would like to kindly ask you to always provide your full code whenever you are asking for help on a forum. For example, it would greatly help if you also added how your page variable was created :slight_smile:

The solution to your issue can be found in the possibly() function in the purrr package. It's a great function, which enables your code to keep running even if an error is encountered in the process. The following code scrapes all 15 tables on the link your provided.

library(dplyr)
library(rvest)
library(purrr)

link <- "https://www.thegreyhoundrecorder.com.au/results/forbury-park/63442"
xpaths <- paste0('//*[@id="race-', 1:15, '"]/table[2]')

scrape_table <- function(link, xpath){
  
  link %>%
    read_html() %>%
    html_nodes(xpath = xpath) %>%
    html_table() %>%
    flatten_df %>%
    setNames(c("plc", "name_box", "trainer", "time", "mgn", "split", "in_run", "wgt", "sire", "dam", "sp"))
  
}

scrape_table_possibly <- possibly(scrape_table, otherwise = NULL)

scraped_tables <- map(xpaths, ~ scrape_table_possibly(link = link, xpath = .x))

The scraped_tables variable is a list of 15 elements (one of each table). The second element of the list is a NULL as specified in the otherwise argument in the possibly() function.

Hope this helps.

1 Like

Thanks very much @gueyenono I read about possibly() recently and thought it was really cool but promptly forgot about its existence. It really solves this elegantly!

And thanks very much for the tip about full reproducibility, will do going forward. Apologies!

You're very welcome @kaushiklakshman. Glad I could help :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.