I'm scrapping the following list of urls:
library(xml2)
library(rvest)
library(stringr)
URL <- "https://thedataweb.rm.census.gov/ftp/cps_ftp.html"
pg <- read_html(URL)
head(html_attr(html_nodes(pg, "a"), "href"))
#> [1] "#cpscert" "#cpsbasic" "#cpsbasic_extract"
#> [4] "#cpsmarch" "#cpssupps" "#cpsrepwgt"
links <- html_attr(html_nodes(pg, "a"), "href")
zips <- str_subset(links, "zip")
zips[[1]]
#> [1] "http://thedataweb.rm.census.gov/pub/cps/supps/jan15-dec15cert_ext.zip"
# I want to get "jan15-dec15cert_ext"
Created on 2019-03-06 by the reprex package (v0.2.1)
I would like to subset zips so I can get the files names (without the extension). For example, from zips[[1]] I want to get jan15-dec15cert_ext. Can somebody help me with this regular expression magic?