Help with Scraping

Please see this previous post:

I'm trying to turn what i scrap from the internet into a dataframe with columns and rows.

You can specify start position of the scrape in the url with the component start = 0. If you notice, when you go to that url and then go to the very next page, the url is the exact same except start=10 instead of start=0. Using this piece of information you can manipulate the url to iterate over pages with this code:


library(rvest) 


url_start <- "https://scholar.google.com/scholar?start=" # this is the portion of the URL before the start result
url_end <- '&q=Eriophyidae&hl=en&as_sdt=0,6' # this is the portion of the URL after the start result

list_html <- vector(mode = 'list', length = 10) # pre-allocate space for a list to store 10 pages

# This for loop iterates over the first 10 pages
# paste 0 and 10*(i - 1) are the magic components here, it makes start = 0, 10, 20 et cetera
for(i in 1:10){
  list_html[[i]] <- paste0(url_start, 10*(i-1), url_end) %>% # 10*(i - 1) is the magic here, it makes start = 0, 10, 20 et cetera
    read_html()
}

## From here you have a list of all pages, map html_text and html_nodes over this list to get the text of all of ## your results

This is great! thank you for your help and I will incorporate it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.