Help with scraping forum posts using rvest

Hello! I am trying to scrape all the posts in a forum website while referring to earlier examples found in this community!

However, I get error messages in the last bit, saying:

  1. Error in mutate(): In argument: messages = map(thread_links, scrape_messages).
    Caused by error in map(): In index: 1.
  2. Caused by error:
    ! './viewforum.php?f=84&sid=616e59608b95e1467d15352e8a3ffe77' does not exist in current working directory.

Could someone please enlighten me as to what went wrong? Thank you much!! :slight_smile:

#install packages
library(rvest)
library(dplyr)
library(stringr)
library(purrr)
# Scrape thread titles, thread links, authors and number of views
h <- read_html("https://forum.singaporeexpats.com/viewforum.php?f=13&sid=597c6ea1f18d07ad8a8a7e304a78e00b")

threads <- h %>%
  html_nodes("#page-body .list-inner a") %>%
  html_text()

thread_links <- h %>%
  html_nodes("#page-body .list-inner a") %>%
  html_attr(name = "href")
# Custom function to scrape messages in each thread
scrape_messages <- function(thread_link){
  read_html(thread_link) %>%
    html_nodes(css = ".content") %>%
    html_text() %>%
    str_squish
}
# Create master dataset (and scrape messages in each thread in process)
master_data <- 
  tibble(threads, thread_links) %>%
  mutate(messages = map(thread_links, scrape_messages)) %>%
  select(threads, messages, thread_links)

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.