Web Scrapping using rvest

Hi, I wrote a r-code to scrap the dealer address from Daiken website. I am encountering a small error. Kindly review and advise on the same. thanks

Website URL : https://www.daikinindia.com/products-services/daikin-dealer-locator

R- Code

url_base <- "https://www.daikinindia.com/products-services/daikin-dealer-locator"
library(rvest)
page <- read_html(url_base)

Dealer_Daiken <- data.frame(State = html_text(html_nodes(page,"#state")),City = html_text(html_nodes(page,"#city")),
Locality = html_text(html_nodes(page,"#locality")),Organisation = html_text(html_nodes(page,".org_name")),
Del_Name = html_text(html_nodes(page,".d_name")), Add = html_text(html_nodes(page,".d_add")), PIN = html_text(html_nodes(page,".pin_clas")),
Phone = html_text(html_nodes(page,".phone_clas")), Mobile = html_text(html_nodes(page,".mobile_class")),  eMail = html_text(html_nodes(page,".email_clas"))) 


Error_Message:
Error in data.frame(State = html_text(html_nodes(page, "#state")), City = html_text(html_nodes(page,  :
  arguments imply differing number of rows: 1, 0
The Expected Output is ::
=============================
 SrNo	State	City	Locality	Dealer Name	DealerContact	Add 1	Add 2	Add 3	Add 4	PIN	Contact No	Mobile No	e-Mail ID
1	Tamil Nadu	Chennai	Nungabakkam	Freeze Air Cools	S Suresh	New No 58 (Old No 45), Puspha Nagar, Main Road		Nungambakkam, Chennai, Tamil Nadu		600034	xxxxx	tttttt	xyz@gmail.com
2	Tamil Nadu	Chennai	Nungabakkam	Glacier Air Systems Pvt. Ltd.	S.M.S. Salahuddin|Ashraf A.R. Buhari	No.204, II Floor, Real Enclave, 43, Josier Street	Nungambakkam, Chennai, Tamil Nadu	Nungambakkam, Chennai, Tamil Nadu		600034	xxxxx	tttttt	xyz@gmail.com


That error message is telling you that the elements you're using to build the data.frame are incompatible. Note in the repex below, stats is a vector of length one (a long character string), where eMail is empty. (a data frame must have columns of equal length)

A good way to understand what's happening here is to look at each of those html_text and html_nodes operations. See below,

url_base <- "https://www.daikinindia.com/products-services/daikin-dealer-locator"
library(rvest)
#> Loading required package: xml2
page <- read_html(url_base)

Dealer_Daiken <- data.frame(
  State = html_text(html_nodes(page,"#state")),
  City = html_text(html_nodes(page,"#city")),
  Locality = html_text(html_nodes(page,"#locality")),
  Organisation = html_text(html_nodes(page,".org_name")),
  Del_Name = html_text(html_nodes(page,".d_name")), 
  Add = html_text(html_nodes(page,".d_add")), 
  PIN = html_text(html_nodes(page,".pin_clas")),
  Phone = html_text(html_nodes(page,".phone_clas")), 
  Mobile = html_text(html_nodes(page,".mobile_class")),  
  eMail = html_text(html_nodes(page,".email_clas"))
  ) 
#> Error in data.frame(State = html_text(html_nodes(page, "#state")), City = html_text(html_nodes(page, : arguments imply differing number of rows: 1, 0


# Note, one large character string
html_text(html_nodes(page,"#state"))
#> [1] "Select StateAndaman and Nicobar IslandsAndhra PradeshAssamBiharChandigarhChhattisgarhDadar & HaveliDelhiGoaGujaratHaryanaHimachal PradeshHyderabad Jammu & KashmirJharkhandKarnatakaKathmanduKeralaMadhya PradeshMaharashtraOdishaOrissaPondicherryPunjabRajasthanTamil NaduTelanganaTripuraUttar PradeshUttarakhandWest Bengal"

# Note, returns empty
html_text(html_nodes(page,".email_clas"))
#> character(0)

Created on 2020-04-16 by the reprex package (v0.3.0)


If you look at that web page, you'll see you need to fill out a set of dropdown fields and click submit in order for the page to display data. For that kind of web scraping you'll need to get R to interact with your web-browser. A popular tool for that is RSelenium

Here's the package vignette on getting that started,

https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html

Thank you for the quick revert. I will go through the selenium and try re-write the code
However, cant we modify this code by discarding all the error items. Please see if you guide me on that

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.