Not able to extract table from webpage

I am trying to extract all tables in the webpage. But when I extract the data, it shows only the heading of the table. Please help. I am extremely new to this. I am attached a picture of the results I am getting.
Website: https://pgems2018.wbsec.org/PublicPages/VotingResult2018.aspx

library(dplyr)
library(stringr)
library(purrr)
library(rvest)
library(RSelenium)


# run in terminal: docker run -d -p 4445:4444 selenium/standalone-chrome


remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
                                 port = 4445L,
                                 browserName = "chrome")
remDr$open()

# naviagte to the website
remDr$navigate("https://pgems2018.wbsec.org/PublicPages/VotingResult2018.aspx")


# Give some time load
Sys.sleep(4)
# Increase the window size to find elements
remDr$maxWindowSize()

remDr$screenshot(display = TRUE) #This will take a screenshot and display it in the RStudio viewer

# Read page source
source <- remDr$getPageSource()[[1]]

# Election type
list_type <- read_html(source) |>
  html_nodes(css = "#ContentPlaceHolder1_cmbCandidateFor") |>
  html_nodes("option") |>
  html_text()
 
list_type <- list_type[-1]

# Zilla Parishad Name 
district <- read_html(source) |>
  html_nodes(css = "#ContentPlaceHolder1_cmbZillaParisadName") |>
  html_nodes("option") |>
  html_attr("value")

district <- district[-1]

# Click on election dropdown list
election_type <- remDr$findElement(using = "css selector", 
                                   value = "#ContentPlaceHolder1_cmbCandidateFor")
election_type$clickElement()
Sys.sleep(4)

# Select election type
ZillaP <- remDr$findElement(using = "css selector", value = "#ContentPlaceHolder1_cmbCandidateFor > option:nth-child(2)" )

ZillaP$clickElement() # selected Zilla Parishad
Sys.sleep(4)

# Preallocate districts
data_district <- vector("list", length(district))

# Iterate over districts
for (k in seq_along(district)){
  
  # Open zilla parishad dropdown list
  district_list <- remDr$findElement(using = "css selector",
                                     value = "#ContentPlaceHolder1_cmbZillaParisadName")
  district_list$clickElement
  Sys.sleep(4)
  
  #click corresponding zilla
  district_current <- remDr$findElement(using = "css selector", value = str_c("select[id = ContentPlaceHolder1_cmbZillaParisadName] > option[value='", district[[k]], "']"))
  
  district_current$clickElement
  Sys.sleep(2)
  
  # click on the search button
  search_button <- remDr$findElement(using = "css selector",
                                     value = "#ContentPlaceHolder1_btnSearch")
  search_button$clickElement
  Sys.sleep(4)
  
  
  #  Populate element of corresponding position (first page)
  data_district[[k]] <- remDr$getPageSource()[[1]] |>
    read_html() |>
    html_table() 
    
    
    
  
 

                                        
}

It would help if you could post your code in a code chunk. It would make it easier to read it. Also, your webpage link does not work on my end for some reason.

Hi, it looks like your code was not formatted correctly to make it easy to read for people trying to help you. Formatting code allows for people to more easily identify where issues may be occurring, and makes it easier to read, in general. I have edited you post to format the code properly.

In the future please put code that is inline (such as a function name, like mutate or filter) inside of backticks (`mutate`) and chunks of code (including error messages and code copied from the console) can be put between sets of three backticks:

```
example <- foo %>%
  filter(a == 1)
```

This process can be done automatically by highlighting your code, either inline or in a chunk, and clicking the </> button on the toolbar of the reply window!

This will help keep our community tidy and help you get the help you are looking for!

For more information, please take a look at the community's FAQ on formating code

I formatted the code and added the website link differently.

The main issue is that you did not use a CSS selector to point to the exact table which contains the data you are trying to scrape. When you click on the "Search" button, what you see is essentially 2 table elements stacked on top of each other. The first table only contains the column names (i.e. Sl.No, Candidate Name, ...). It is the second table that contains the data. Your code scraped the first table this is why you ended up with column names only and no data.

The code below scrapes the second table only. You may want to focus on the for loop at the end:

# Load packages ----

library(glue)
library(netstat)
library(purrr)
library(RSelenium)
library(rvest)


# Set up a Selenium server ----

driver <- rsDriver(
  port = free_port(random = TRUE),
  browser = "firefox",
  chromever = NULL,
  check = TRUE
)

remote_driver <- driver$client
remote_driver$navigate(url = "https://pgems2018.wbsec.org/PublicPages/VotingResult2018.aspx")


# Get elements of interest ----

# > "Candidate for" dropdown menu
candidate_for_dropdown <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_cmbCandidateFor")

# > "Zilla Parishad Name" dropdown menu
zilla_pname_dropdown <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_cmbZillaParisadName")

# > Get "Search" button element ----
# search_button <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_btnSearch")


# Get all options from the "Candidate for" dropdown menu ----

html_source <- remote_driver$getPageSource()[[1]]

list_type <- read_html(html_source) |>
  html_node(css = "#ContentPlaceHolder1_cmbCandidateFor") |>
  html_nodes("option") |>
  html_text() %>%
  .[-1]

# Get all options from the "Zilla Parishad Name" dropdown menu ----

district <- read_html(html_source) |>
  html_node(css = "#ContentPlaceHolder1_cmbZillaParisadName") |>
  html_nodes("option") |>
  html_text() %>%
  .[-1]

n_district <- length(district)


# Set "Candidate For" dropdown menu to "Zilla Parishad" ----

candidate_for_dropdown$clickElement()
zilla_parishad_option <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_cmbCandidateFor > option:nth-child(2)")
zilla_parishad_option$clickElement()


# Scrape the table for each district ----

table_list <- vector(mode = "list")

for(i in seq_len(n_district) + 1){
  
  # Find current district and click on it
  
  zilla_pname_dropdown <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_cmbZillaParisadName")
  zilla_pname_dropdown$clickElement()
  current_district <- remote_driver$findElement(
    using = "css",
    value = glue("#ContentPlaceHolder1_cmbZillaParisadName > option:nth-child({i})")
  )
  current_district$clickElement()
  Sys.sleep(1)
  
  # Click on the search button 
  
  search_button <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_btnSearch")
  search_button$clickElement()
  Sys.sleep(3)
  
  # Scrape table
  
  table_element <- remote_driver$findElement(using = "css", value = "#ContentPlaceHolder1_dgvHeader")
  
  d <- table_element$getPageSource()[[1]] %>%
    read_html() %>%
    html_elements(css = "#ContentPlaceHolder1_dgvResultList") %>%
    html_table()
  
  print(d)
  
  table_list[i-1] <- d
  
}

It works.Thank you so much.

I'm glad I was able to help :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.