I'm trying to scrape contributions to political campaigns in Nevada for the 2018 election. This information is published in reports like this one.
I've figured out that I want the
#ctl04_mobjContributions_dgContributions nodes using the SelectorGadget plugin as described in the rvest vignette, but I'm running into a problem with nested HTML tables.
The HTML page looks like:
The issue is that only some rows (corresponding to contributors have made multiple contributors) have nested tables, and the others don't. My first pass at this with
rvest chokes on this.
Unfortunately, I have a second problem as well, which is that sometimes I when scrape I get something, and sometimes I don't. At the moment, I'm getting nothing:
library(rvest) #> Loading required package: xml2 library(tidyverse) url <- "https://www.nvsos.gov/SOSCandidateServices/AnonymousAccess/ViewCCEReport.aspx?syn=LFKB0Mi%252b7KWJ7Ij1s7NC8g%253d%253d" pg <- read_html(url) pg %>% html_nodes("#ctl04_mobjContributions_dgContributions") %>% html_table(header = TRUE, fill = TRUE) %>% as_tibble(.name_repair = "unique") #> # A tibble: 0 x 0
Created on 2019-05-06 by the reprex package (v0.2.1)
When things do work, I started down the
xml2::as_list() road, but it's really tedious and slow, so I'm hoping for a more pleasant solution.