Webscraping with rvest and dplyr

Please I am trying to scrape the keywords of trending topics from the google news website in r studio. I used both the rvest and deplyr packages. I also used selector gadget in google chrome to find the tags for the keywords.
Here is my code below:

library(rvest)
library(dplyr)
google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")
google.news %>%
+html_nodes(".boy4he") %>%
+html_text()

But when I run the code, I get the error message:

> google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")
> google.news %>%
+ +html_nodes(".boy4he") %>%
+ +html_text()
Error in UseMethod("xml_find_all") : 
  no applicable method for 'xml_find_all' applied to an object of class "character"

Can somebody please advise me on what could be wrong? Thanks

you have erroneous plus symbols in your code +
remove them.

Thanks. I removed them and ran the code again, and this is what I got:

 google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")
> google.news %>%
+ html_nodes(".boy4he") %>%
+ html_text
 [1] "" "" "" "" "" "" "" "" "" ""
>

It still did not scrape any data from google news.

library(rvest)
library(tidyverse)
google.news<-read_html("https://news.google.com/topstories?hl=en-NG&gl=NG&ceid=NG:en")
google.news %>%
html_nodes(".boy4he") -> mynodes
mynodes %>% html_attrs() -> my_attrs

(my_result <- purrr::map_dfr(my_attrs,
               ~tibble(tag=.[[1]],url=.[[2]],name=.[[3]])))

Thanks nirgrahamuk. This code does scrape some data, but it does not present it in the format I want, which makes analysis easier, unlike if it was scraped with dplyr. Thanks again for your help; it is very much appreciated.

You're welcome,

I'm afraid I don't know what you mean by that...

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.