Rvest XPath problem

Hello Everyone,

I have a problem with rvest. I am trying to scrap a dropdown list based on previous choise from other dropdown list. Here is the problem:

from this webpage -> https://www.otomoto.pl/osobowe/volkswagen/"

I want to extract all the car models that belongs to volkswagen for example using html_nodes() and XPath

//[@id="param571"]/option <- XPath for car brands (BMW, Audi, Volkswagen etc.)
//
[@id="param573"]/option <- XPath for car models ( only Volkswagen models etc.)

For the brands works perfect but with models it doesn't at all resoults:

character(0)

Code I used:

brand <- read_html("https://www.otomoto.pl/osobowe/volkswagen/")
modelsv2 <- html_nodes(brand, xpath = '//*[@id="param573"]/option') %>%
  html_text()

modelsv2

Much apprecieted for any help I am fighting with this over a week! Thanks

Currently the XPATH is not correct. You could use SelectorGadget to help you find the correct XPATH. See the rvest vignette
https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html

1 Like

Hey thanks for the response I have used it and new path looks like this:

//*[(@id = "select2-param573-container")]

but the result didn't change:

character(0)

It seems this website cannot be scraped with rvest because what you want to get is not in the html page but created dynamically by Javascript.

You can check with

library(magrittr)
readLines(url("https://www.otomoto.pl/osobowe/volkswagen/")) %>% 
  stringr::str_detect("select2-param573-container") %>% 
  any()
#> [1] FALSE

Created on 2019-04-02 by the reprex package (v0.2.1.9000)

or by downloading the html file

download.file("https://www.otomoto.pl/osobowe/volkswagen/", destfile = {a <- tempfile()})
file.edit(a)

You need to use other tools like RSelenium or phantomJS.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.