Rvest XPath problem

pawel95 · March 31, 2019, 8:55pm

Hello Everyone,

I have a problem with rvest. I am trying to scrap a dropdown list based on previous choise from other dropdown list. Here is the problem:

from this webpage -> Volkswagen - Samochody Osobowe - Otomoto.pl"

I want to extract all the car models that belongs to volkswagen for example using html_nodes() and XPath

//[@id="param571"]/option <- XPath for car brands (BMW, Audi, Volkswagen etc.)
//[@id="param573"]/option <- XPath for car models ( only Volkswagen models etc.)

For the brands works perfect but with models it doesn't at all resoults:

character(0)

Code I used:

brand <- read_html("https://www.otomoto.pl/osobowe/volkswagen/")
modelsv2 <- html_nodes(brand, xpath = '//*[@id="param573"]/option') %>%
  html_text()

modelsv2

Much apprecieted for any help I am fighting with this over a week! Thanks

cderv · April 1, 2019, 9:14pm

Currently the XPATH is not correct. You could use SelectorGadget to help you find the correct XPATH. See the rvest vignette
https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html

pawel95 · April 1, 2019, 10:17pm

Hey thanks for the response I have used it and new path looks like this:

//*[(@id = "select2-param573-container")]

but the result didn't change:

character(0)

cderv · April 2, 2019, 5:50am

It seems this website cannot be scraped with rvest because what you want to get is not in the html page but created dynamically by Javascript.

You can check with

library(magrittr)
readLines(url("https://www.otomoto.pl/osobowe/volkswagen/")) %>% 
  stringr::str_detect("select2-param573-container") %>% 
  any()
#> [1] FALSE

^{Created on 2019-04-02 by the reprex package (v0.2.1.9000)}

or by downloading the html file

download.file("https://www.otomoto.pl/osobowe/volkswagen/", destfile = {a <- tempfile()})
file.edit(a)

You need to use other tools like RSelenium or phantomJS.

system · April 23, 2019, 5:50am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.