Trouble scraping nested HTML tables

I'm trying to scrape contributions to political campaigns in Nevada for the 2018 election. This information is published in reports like this one.

I've figured out that I want the #ctl04_mobjContributions_dgContributions nodes using the SelectorGadget plugin as described in the rvest vignette, but I'm running into a problem with nested HTML tables.

The HTML page looks like:

The issue is that only some rows (corresponding to contributors have made multiple contributors) have nested tables, and the others don't. My first pass at this with rvest chokes on this.

Unfortunately, I have a second problem as well, which is that sometimes I when scrape I get something, and sometimes I don't. At the moment, I'm getting nothing:

library(rvest)
#> Loading required package: xml2
library(tidyverse)

url <- "https://www.nvsos.gov/SOSCandidateServices/AnonymousAccess/ViewCCEReport.aspx?syn=LFKB0Mi%252b7KWJ7Ij1s7NC8g%253d%253d"

pg <- read_html(url)

pg %>%
  html_nodes("#ctl04_mobjContributions_dgContributions") %>% 
  html_table(header = TRUE, fill = TRUE) %>% 
  as_tibble(.name_repair = "unique")
#> # A tibble: 0 x 0

Created on 2019-05-06 by the reprex package (v0.2.1)

I have copies of pg saved to an HTML file when I get things and also when I don't.

When things do work, I started down the xml2::as_list() road, but it's really tedious and slow, so I'm hoping for a more pleasant solution.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.