Scraping forum content using R

I used the the code shared by you for some other user, however there are no 1,2,3...n pages in the forum..The forum has load more option only

So i want to scrape the content from forum say it is www.forum.com for a particular keyword list (e.g. Cab,tractor,price,model, etc.)

The result should be date in one column and content
in another

Please help me to write a code basis this and also it is my first time so if you could please explain the intermediate steps. It would be of great help to me

Another question is..Is it possible to tell the code I want to extract only the content for a specific date range?

Thanks in Advance

Hi,

Welcome to the RStudio community!

I think the best way to get started is to start reading up on a few different approaches you can use. We are not here to write code on your request, but to help you out with struggles you have with existing code. I suggest you familiarise yourself with the basics, have a go at it, and if you get stuck you post a reprex here and people can help you out with a particular issue you're having. A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

That said, to get started, there are several appraches for scraping using R, depending on the needs of your project

  • rvest is one of the the most commonly used packages and helps you read the HTML structure and extract data you might need. Here is a great introductory tutorial on that:
    https://www.dataquest.io/blog/web-scraping-in-r-rvest/
  • If you want to scrape webpages that have interactive elements on them (e.g. buttons that need to be clicked in order for the data to appear) you will need to use something like RSelenium, in which you open a browser (e.g. Chrome) and automate it to perform the clicking / typing actions while scraping. It's a bit of a learning curve, but very powerful. Here's a tutorial: https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html

Hope this helps,
PJ

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.