Each week I download and process data from a hyperlink on a website, but each week the link changes as new data is uploaded to the site. The hyperlink text doesn't change, so wondering how I can just reference the text instead of updating the lengthy full length each week. The website is https://www.esd.wa.gov/labormarketinfo/unemployment-insurance-data . I'd ideally like to pull the data using the hyperlink tag "initial claims by county," but not versed enough perhaps in HTML to know how to do this. Any help most appreciated!
Below is the code I'm using:
IC_Claims_by_County <- "https://esdorchardstorage.blob.core.windows.net/esdwa/Default/ESDWAGOV/labor-market-info/Libraries/Regional-reports/UI-Claims-Karen/COVID19%20Docs/County%20weekly%20initial%20claims%20for%202020%20(21).xlsx "
download.file(IC_Claims_by_County,"UI_Initial_Claims_County.xlsx", method = "curl")
IC_Claims_by_County <- data.frame(read_excel("UI_Initial_Claims_County.xlsx", sheet = "Weekly county initial claims"))
library(xml2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
url="https://www.esd.wa.gov/labormarketinfo/unemployment-insurance-data"
doc1 =xml2::read_html(url)
doc1_a = xml2::xml_find_all(doc1, "//a")
unpack_a <- function (doc) {
c1 = xml2::xml_text(doc)
c2 = xml2::xml_attr(doc, 'href')
c1 = ifelse(is.null(c1),'',c1)
c2 = ifelse(is.null(c2),'',c2)
tibble::tibble(c1 = c1, c2 = c2)
}
xx1 = purrr::map_dfr(doc1_a,unpack_a) %>%
dplyr::filter(c1 == 'initial claims by County') %>%
dplyr::pull(c2)
xx1
#> [1] "https://esdorchardstorage.blob.core.windows.net/esdwa/Default/ESDWAGOV/labor-market-info/Libraries/Regional-reports/UI-Claims-Karen/COVID19%20Docs/County%20weekly%20initial%20claims%20for%202020%20(21).xlsx"
Created on 2020-06-09 by the reprex package (v0.3.0)
That works perfectly! Thanks so much @HanOostdijk !
system
Closed
June 30, 2020, 5:57pm
4
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.