R Connection timed out after "x" milliseconds

I've been trying to scrape a large list of websites for its Title, Description, and Keywords using rvest with a loop, but R keeps giving me a connection timed out error:

Error in open.connection(x, "rb") : Timeout was reached: Connection timed out after 10000 milliseconds

I found an alternative way to do it with RSelenium, but it takes forever to run down the list, so I'm not sure if there's a workaround the timed out error message that anyone know? I tried options(timeout = 9999999) but it doesn't work. Here is my code with rvest:

library(rvest)
library(dplyr)
webpages <- data.frame(name = c("amazon", "apple", "usps", "yahoo", "bbc", "ted", "surveymonkey", "forbes", "imdb", "hp"),
                       url = c("http://www.amazon.com",
                               "http://www.apple.com",
                               "http://www.usps.com",
                               "http://www.yahoo.com",
                               "http://www.bbc.com",
                               "http://www.ted.com",
                               "http://www.surveymonkey.com",
                               "http://www.forbes.com",
                               "http://www.imdb.com",
                               "http://www.hp.com"))

webpages <- apply(webpages, 1, function(x){
  URL <- read_html(x['url'], encoding = "UTF-8")
  
  results <- URL %>% html_nodes("head")
  
  records <- vector("list", length = length(results))
  
  for (i in seq_along(records)) {
    title <- xml_contents(results[i] %>% html_nodes("title"))[1] %>% html_text(trim = TRUE)
    desc <- html_nodes(results[i], "meta[name=description]") %>% html_attr("content")
    kw <- html_nodes(results[i], "meta[name=keywords]") %>% html_attr("content")
  }
  
  return(data.frame(name = x['name'],
                    url = x['url'],
                    title = ifelse(length(title) > 0, title, NA),
                    description = ifelse(length(desc) > 0, desc, NA),
                    keywords = ifelse(length(kw) > 0, kw, NA)))
})

webpages <- do.call(rbind, webpages)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.