Hello Everyone,
I have a problem with rvest. I am trying to scrap a dropdown list based on previous choise from other dropdown list. Here is the problem:
from this webpage -> Volkswagen - Samochody Osobowe - Otomoto.pl "
I want to extract all the car models that belongs to volkswagen for example using html_nodes() and XPath
//[@id="param571"]/option <- XPath for car brands (BMW, Audi, Volkswagen etc.)
// [@id="param573"]/option <- XPath for car models ( only Volkswagen models etc.)
For the brands works perfect but with models it doesn't at all resoults:
character(0)
Code I used:
brand <- read_html("https://www.otomoto.pl/osobowe/volkswagen/")
modelsv2 <- html_nodes(brand, xpath = '//*[@id="param573"]/option') %>%
html_text()
modelsv2
Much apprecieted for any help I am fighting with this over a week! Thanks
cderv
April 1, 2019, 9:14pm
2
Currently the XPATH is not correct. You could use SelectorGadget to help you find the correct XPATH. See the rvest vignette
https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html
1 Like
Hey thanks for the response I have used it and new path looks like this:
//*[(@id = "select2-param573-container")]
but the result didn't change:
character(0)
cderv
April 2, 2019, 5:50am
4
It seems this website cannot be scraped with rvest because what you want to get is not in the html page but created dynamically by Javascript.
You can check with
library(magrittr)
readLines(url("https://www.otomoto.pl/osobowe/volkswagen/")) %>%
stringr::str_detect("select2-param573-container") %>%
any()
#> [1] FALSE
Created on 2019-04-02 by the reprex package (v0.2.1.9000)
or by downloading the html file
download.file("https://www.otomoto.pl/osobowe/volkswagen/", destfile = {a <- tempfile()})
file.edit(a)
You need to use other tools like RSelenium or phantomJS .
system
Closed
April 23, 2019, 5:50am
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.