Hi,
I'm knew to web scraping and running into some issues for scraping the name Liam on Wikipedia. I'm scraping for Irish, Ireland, and Catholic on Liam Wikipedia pages. I think the code works until Liam_urls <- paste0("https://en.wikipedia.org",Liam_urls) but could be wrong. I get the error message Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version or Error in function (type, msg, asError = TRUE) : <url> malformed
How should I adjust my code?
Thanks for your help.
library(RCurl)
library(rvest)
library(stringr)
html_attr(html_nodes(read_html("https://en.wikipedia.org/wiki/Liam"), "a[title^=Liam]"),"href")
Liam_urls <- html_attr(html_nodes(read_html("https://en.wikipedia.org/wiki/Liam"), "a[title^=Liam]"),"href")
Liam_urls <- Liam_urls[which(!str_detect(Liam_urls, "https"))]
Liam_urls
Liam_urls <- paste0("https://en.wikipedia.org",Liam_urls)
scraped_Liam <- sapply(Liam_urls, function(x) getURL(x))
results_Liam <- sapply(scraped_Liam, function(x) str_detect(x,"Irish|Ireland|Catholic"))
results_Liam.df <- data.frame("Hit"=results_Liam, stringsAsFactors = FALSE)
length(results_Liam.df$Hit[which(results_Liam.df$Hit==TRUE)])/length(results_Liam.df$Hit)