Hi folks,
Just joined the community and am a newby. I wanted to look into scraping some data (mostly in excel format) from various government statistical website (BLS, USDA, etc). I did some googling and found lots of info, but I would like to know what the "latest" and possibly easiest packages to use with scraping. Any good site or packages that would get me started.
There's a handy post, Scraping Responsibly with R, below that goes into some of the details of checking some of the nitty grittier details, but one of the rules of thumb is that, if there's an API, you should try to avoid scraping:
If you do need to scrape a site, the best tool for the job depends on how a site is generated — rvest is probably the most common one you'll see.
hrbrmstr has a great collection of posts using various stacks to scrape sites with dynamic content: