Scraping forum content using R

Hi,

Welcome to the RStudio community!

I think the best way to get started is to start reading up on a few different approaches you can use. We are not here to write code on your request, but to help you out with struggles you have with existing code. I suggest you familiarise yourself with the basics, have a go at it, and if you get stuck you post a reprex here and people can help you out with a particular issue you're having. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

That said, to get started, there are several appraches for scraping using R, depending on the needs of your project

  • rvest is one of the the most commonly used packages and helps you read the HTML structure and extract data you might need. Here is a great introductory tutorial on that:
    https://www.dataquest.io/blog/web-scraping-in-r-rvest/
  • If you want to scrape webpages that have interactive elements on them (e.g. buttons that need to be clicked in order for the data to appear) you will need to use something like RSelenium, in which you open a browser (e.g. Chrome) and automate it to perform the clicking / typing actions while scraping. It's a bit of a learning curve, but very powerful. Here's a tutorial: Basics

Hope this helps,
PJ

1 Like