RVest cannot access past match IDs

I am new to Rvest and am trying to get the Match IDs for each game played on a specific date in a specific league. I know to use html_nodes to look up classes or IDs, but I cannot access what I need here. Each match url looks like this http://www.espn.com/soccer/match?gameId=XXXXXX where XXXXXX is the match ID. This is what I am looking to get for all matches.

The match IDs I need to get are the IDs of the articles below what is written in the code below, but I cannot seem to get any lower in the HTML structure than this.

Can anyone suggest a fix for me? This might be generated on the server side?

library(rvest)
url <- "http://www.espn.com/soccer/scoreboard/_/league/esp.1/date/20190106"
scoreboard <- url %>% read_html()
GameID <- scoreboard %>% html_nodes(xpath = '//*[@id="events"]') 


I think this is because the div with id events is empty and is filled using Javascript, dynamically inside the browser.
You can't use rvest for such task. You need to look for :package: that wraps solution that can deal with website with javascript

  • RSelenium
  • splashr
  • or a non-R tool like phantom.js

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.