Rvest XPath problem

Hello Everyone,

I have a problem with rvest. I am trying to scrap a dropdown list based on previous choise from other dropdown list. Here is the problem:

from this webpage -> https://www.otomoto.pl/osobowe/volkswagen/"

I want to extract all the car models that belongs to volkswagen for example using html_nodes() and XPath

//[@id="param571"]/option <- XPath for car brands (BMW, Audi, Volkswagen etc.)
[@id="param573"]/option <- XPath for car models ( only Volkswagen models etc.)

For the brands works perfect but with models it doesn't at all resoults:


Code I used:

brand <- read_html("https://www.otomoto.pl/osobowe/volkswagen/")
modelsv2 <- html_nodes(brand, xpath = '//*[@id="param573"]/option') %>%


Much apprecieted for any help I am fighting with this over a week! Thanks

Currently the XPATH is not correct. You could use SelectorGadget to help you find the correct XPATH. See the rvest vignette

1 Like

Hey thanks for the response I have used it and new path looks like this:

//*[(@id = "select2-param573-container")]

but the result didn't change:


It seems this website cannot be scraped with rvest because what you want to get is not in the html page but created dynamically by Javascript.

You can check with

readLines(url("https://www.otomoto.pl/osobowe/volkswagen/")) %>% 
  stringr::str_detect("select2-param573-container") %>% 
#> [1] FALSE

Created on 2019-04-02 by the reprex package (v0.2.1.9000)

or by downloading the html file

download.file("https://www.otomoto.pl/osobowe/volkswagen/", destfile = {a <- tempfile()})

You need to use other tools like RSelenium or phantomJS.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.