Scraping past html comments with rvest

dcruvolo · July 10, 2018, 3:55am

The returned values are blank as the site's content (games, scores, channels, et.c) is generated server side.

This was verified by running the following:

Changed html_children() to html_structure()
Verifying the content through the JS console (I modified the path slightly to specifically call the elements that contain the broadcast channel). This returns a list of 12 nodes one for each game on the broadcasted on the date.


# updated selector path
el = document.querySelectorAll('#scoresPage > .row:nth-child(2) > .scores__inner > div:nth-child(1) > .linescores-container > .game > .row > .large-12 > .linescore-header > .scores__inner__broadcaster > span:nth-child(2)')

# view all
for(i = 0; i < el.length; i++){
    console.log(el[i]);
}

Using Rselenium is a better option in this situation. It would look something like this:

# install --init only
devtools::install_github("johndharrison/binman")
devtools::install_github("johndharrison/wdman")
devtools::install_github("ropensci/RSelenium")

# set up
require(RSelenium)
rsd <- RSelenium::rsDriver(browser = "chrome")  # or other browser
rsc <- rsd$client

# navigate to page
rsc$navigate("https://stats.nba.com/scores/04/11/2018")


# set path
path <- "#scoresPage > .row:nth-child(2) > .scores__inner > div:nth-child(1) > .linescores-container > .game > .row > .large-12 > .linescore-header > .scores__inner__broadcaster > span:nth-child(2)"

# scrape elements
el <- rsc$findElements(using = "css",value=path)

# extract text
out <- sapply(el, function(x){x$getElementText()})
channels <- data.matrix(out)

# continue transformations here

You can find more information here: http://ropensci.github.io/RSelenium/

Hope that helps!