Scraping (Introduction)

Hi folks,
Just joined the community and am a newby. I wanted to look into scraping some data (mostly in excel format) from various government statistical website (BLS, USDA, etc). I did some googling and found lots of info, but I would like to know what the "latest" and possibly easiest packages to use with scraping. Any good site or packages that would get me started.

Thanks,

In this case, I would highly recommend that you take a look at the packages that exist and use various agency APIs:

There's a handy post, Scraping Responsibly with R, below that goes into some of the details of checking some of the nitty grittier details, but one of the rules of thumb is that, if there's an API, you should try to avoid scraping:

If you do need to scrape a site, the best tool for the job depends on how a site is generated — rvest is probably the most common one you'll see.

hrbrmstr has a great collection of posts using various stacks to scrape sites with dynamic content:

Edit Also a hrbrmstr-recommended link:
https://towardsdatascience.com/ethics-in-web-scraping-b96b18136f01

1 Like

@mara
This is great. Thanks very much for the help!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.