Improving the names of the outputs (pdfs) in a scraping with rvest

Hi R community,
I have done a scraping with rvest, but the names of the pdfs (outputs) in the end were not good. I am trying to improve it and I would like some tips, if it is possible.

In the end, I would like the names of the pdfs something like "year_names_.pdf" or

I've tried what follows below. The problem is that I was not able to keep the columns "year" and "names", created in the first loop, in the second loop. Is it possible?

Main pages

url <- ""

urls_main_pages <- c(url, paste0(url, 2:3))

main_dt <- data.frame()

# to remove to form the names
pattern <- ""
pattern_2 <- ""
pattern_3 <- ""
pattern_4 <- ""
pattern_5 <- ""
pattern_6 <- ""

# get links of the main pages 
for(i in seq_along(urls_main_pages)){
  pages_html <- read_html(urls_main_pages[i])
  nodes <- html_nodes(pages_html, '.prova_download')
  links <- html_attr(nodes, "href") 
  main_dt <- rbind(main_dt, cbind(links))
  # extract names and years
  names <- str_remove_all(main_dt$links, pattern)
  names <- str_remove_all(names, pattern_2)
  names <- str_remove_all(names, pattern_3)
  names <- str_remove_all(names, pattern_4)
  names <- str_remove_all(names, pattern_5)
  names <- str_remove_all(names, pattern_6)
  year <- str_extract_all(names, "[\\d]{4}+", simplify = T)
  names <- str_remove_all(names,"-[\\d]+")
  # here a have created a dataframe with year and names which 
  #I would like to keep to use in the names of the pdfs in the end. 
  main_dt_2 <- cbind(main_dt, year, names)
main_dt_2$links <- as.character(main_dt_2$links)

Children pages

links_pdf <- data.frame()

for(i in seq_along(main_dt_2$links)){
  link_page <- read_html(main_dt_2$links[i])
  link_page <- html_nodes(link_page, xpath = '//*[@id="download"]/ul[3]')
  link_page <- html_nodes(link_page, 'a')
  link_page <- html_attr(link_page, "href")
  links_pdf <- rbind(links_pdf, cbind(link_page))
  # Here, or somewhere inside this loop, I would like to 
  # join the columns years and names from main_dt_2 to the links_pdf
  # with the aim to use they in the names of the pdfs in the end

Thanks in advantage and happy code,

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.