Extracting PDF URLs from a webpage

Hi everyone,

I am new in R and trying to extract PDF URLs from this website. My goal is to download the PDFs from 2017 till date.

(https://dekalbcountyga.legistar.com/DepartmentDetail.aspx?ID=29350&GUID=A51C5572-654E-4DD9-A867-093BF2943C47&R=481ebb46-ce9d-4d0f-a1e2-a3e7895be9c2

Here is my code:

webpage_url <- "https://dekalbcountyga.legistar.com/DepartmentDetail.aspx?ID=29350&GUID=A51C5572-654E-4DD9-A867-093BF2943C47&R=481ebb46-ce9d-4d0f-a1e2-a3e7895be9c2"
link <- webpage %>%
  html_nodes("td:nth-child(8) a")%>%
  html_attr("href")

I got NA NA. Someone kindly assist.

Thank you.

The target has two records, one of which has no data other than date, time and location. NAs are to be expected.

Thanks for your response. I used the selector gadget to target the minutes summary column html_nodes("td:nth-child(8) a") in the table. I thought that should fetch me all the urls associated with the node I selected.

How can I better target the (minutes summary)[https://dekalbcountyga.legistar.com/DepartmentDetail.aspx?ID=29350&GUID=A51C5572-654E-4DD9-A867-093BF2943C47&R=481ebb46-ce9d-4d0f-a1e2-a3e7895be9c2] column using rvest? Grateful for any useful guide.

See this S/O on what's involved in scraping aspx sites that hide content in javascript. From an url standpoint, all pages reachable from the initial url identically rely on two arguments— ID=75127 and GUID=31C18E46-AF97-4F40-B497-1FFA1CBE55FA as an entry point to the js search functionality. Where it goes from there probably requires close study of the js code, something I can't help you with. There appear to be four output options: cut-and-paste from the screen or export to Excel, pdf or Word. To Excel to csv seems most promising.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.