Glad it worked out! I used Inspect Element
and typed out the css path by reading the source code. Sorry, it looked like the previous version dropped some elements (I'm not sure what I was thinking by using span:nth-child(2)
). I like the changes in the css path. The data is a better format too.
Where there are images instead of text, you can extract the value in the bc
attribute located in <stats-broadcaster-logo>
. This path is defined below.
# set paths: for <span> and for <stats-broadcaster-logo>
path <- "#scoresPage > .row:nth-child(2) > .scores__inner > div:nth-child(1) > .linescores-container > .game > .row > .large-12 > .linescore-header > .scores__inner__broadcaster"
img.path <- paste0(path," > stats-broadcaster-logo")
Then, use the getElementAttribute
function to extract the text in the attribute bc
.
# scrape elements
logo <- rsc$findElements(using = "css",value = img.path)
# extract text
imgs <- sapply(logo, function(x){ x$getElementAttribute("bc") })
imgs <- data.matrix(imgs)
Here's the full r code.
# set up
require(RSelenium)
rsd <- RSelenium::rsDriver(browser = "chrome")
rsc <- rsd$client
# navigate to page
rsc$navigate("https://stats.nba.com/scores/04/11/2018")
# set paths: for <span> and for <stats-broadcaster-logo>
path <- "#scoresPage > .row:nth-child(2) > .scores__inner > div:nth-child(1) > .linescores-container > .game > .row > .large-12 > .linescore-header > .scores__inner__broadcaster"
img.path <- paste0(path," > stats-broadcaster-logo")
# scrape elements
el <- rsc$findElements(using = "css",value=path)
logo <- rsc$findElements(using = "css",value = img.path)
# extract text
out <- sapply(el, function(x){x$getElementText()})
channels <- data.matrix(out)
# extract attributes
imgs <- sapply(logo, function(x){ x$getElementAttribute("bc") })
imgs <- data.matrix(imgs)
# view
channels
imgs
# continue with transformations
# close all connections
rsc$close()
Hope that helps!