Web Scrapping using rvest

Hi, I wrote a r-code to scrap the dealer address from Daiken website. I am encountering a small error. Kindly review and advise on the same. thanks

Website URL : https://www.daikinindia.com/products-services/daikin-dealer-locator

R- Code

url_base <- "https://www.daikinindia.com/products-services/daikin-dealer-locator"
library(rvest)
page <- read_html(url_base)

Dealer_Daiken <- data.frame(State = html_text(html_nodes(page,"#state")),City = html_text(html_nodes(page,"#city")),
Locality = html_text(html_nodes(page,"#locality")),Organisation = html_text(html_nodes(page,".org_name")),
Del_Name = html_text(html_nodes(page,".d_name")), Add = html_text(html_nodes(page,".d_add")), PIN = html_text(html_nodes(page,".pin_clas")),
Phone = html_text(html_nodes(page,".phone_clas")), Mobile = html_text(html_nodes(page,".mobile_class")),  eMail = html_text(html_nodes(page,".email_clas"))) 


Error_Message:
Error in data.frame(State = html_text(html_nodes(page, "#state")), City = html_text(html_nodes(page,  :
  arguments imply differing number of rows: 1, 0
The Expected Output is ::
=============================
 SrNo	State	City	Locality	Dealer Name	DealerContact	Add 1	Add 2	Add 3	Add 4	PIN	Contact No	Mobile No	e-Mail ID
1	Tamil Nadu	Chennai	Nungabakkam	Freeze Air Cools	S Suresh	New No 58 (Old No 45), Puspha Nagar, Main Road		Nungambakkam, Chennai, Tamil Nadu		600034	xxxxx	tttttt	xyz@gmail.com
2	Tamil Nadu	Chennai	Nungabakkam	Glacier Air Systems Pvt. Ltd.	S.M.S. Salahuddin|Ashraf A.R. Buhari	No.204, II Floor, Real Enclave, 43, Josier Street	Nungambakkam, Chennai, Tamil Nadu	Nungambakkam, Chennai, Tamil Nadu		600034	xxxxx	tttttt	xyz@gmail.com


That error message is telling you that the elements you're using to build the data.frame are incompatible. Note in the repex below, stats is a vector of length one (a long character string), where eMail is empty. (a data frame must have columns of equal length)

A good way to understand what's happening here is to look at each of those html_text and html_nodes operations. See below,

url_base <- "https://www.daikinindia.com/products-services/daikin-dealer-locator"
library(rvest)
#> Loading required package: xml2
page <- read_html(url_base)

Dealer_Daiken <- data.frame(
  State = html_text(html_nodes(page,"#state")),
  City = html_text(html_nodes(page,"#city")),
  Locality = html_text(html_nodes(page,"#locality")),
  Organisation = html_text(html_nodes(page,".org_name")),
  Del_Name = html_text(html_nodes(page,".d_name")), 
  Add = html_text(html_nodes(page,".d_add")), 
  PIN = html_text(html_nodes(page,".pin_clas")),
  Phone = html_text(html_nodes(page,".phone_clas")), 
  Mobile = html_text(html_nodes(page,".mobile_class")),  
  eMail = html_text(html_nodes(page,".email_clas"))
  ) 
#> Error in data.frame(State = html_text(html_nodes(page, "#state")), City = html_text(html_nodes(page, : arguments imply differing number of rows: 1, 0


# Note, one large character string
html_text(html_nodes(page,"#state"))
#> [1] "Select StateAndaman and Nicobar IslandsAndhra PradeshAssamBiharChandigarhChhattisgarhDadar & HaveliDelhiGoaGujaratHaryanaHimachal PradeshHyderabad Jammu & KashmirJharkhandKarnatakaKathmanduKeralaMadhya PradeshMaharashtraOdishaOrissaPondicherryPunjabRajasthanTamil NaduTelanganaTripuraUttar PradeshUttarakhandWest Bengal"

# Note, returns empty
html_text(html_nodes(page,".email_clas"))
#> character(0)

Created on 2020-04-16 by the reprex package (v0.3.0)


If you look at that web page, you'll see you need to fill out a set of dropdown fields and click submit in order for the page to display data. For that kind of web scraping you'll need to get R to interact with your web-browser. A popular tool for that is RSelenium

Here's the package vignette on getting that started,

https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Thank you for the quick revert. I will go through the selenium and try re-write the code
However, cant we modify this code by discarding all the error items. Please see if you guide me on that