I'm trying to scrape contributions to political campaigns in Nevada for the 2018 election. This information is published in reports like this one.
I've figured out that I want the #ctl04_mobjContributions_dgContributions
nodes using the SelectorGadget plugin as described in the rvest vignette, but I'm running into a problem with nested HTML tables.
The HTML page looks like:
The issue is that only some rows (corresponding to contributors have made multiple contributors) have nested tables, and the others don't. My first pass at this with rvest
chokes on this.
Unfortunately, I have a second problem as well, which is that sometimes I when scrape I get something, and sometimes I don't. At the moment, I'm getting nothing:
library(rvest)
#> Loading required package: xml2
library(tidyverse)
url <- "https://www.nvsos.gov/SOSCandidateServices/AnonymousAccess/ViewCCEReport.aspx?syn=LFKB0Mi%252b7KWJ7Ij1s7NC8g%253d%253d"
pg <- read_html(url)
pg %>%
html_nodes("#ctl04_mobjContributions_dgContributions") %>%
html_table(header = TRUE, fill = TRUE) %>%
as_tibble(.name_repair = "unique")
#> # A tibble: 0 x 0
Created on 2019-05-06 by the reprex package (v0.2.1)
I have copies of pg
saved to an HTML file when I get things and also when I don't.
When things do work, I started down the xml2::as_list()
road, but it's really tedious and slow, so I'm hoping for a more pleasant solution.