Loop FOR with Rvest for scraping

Hello Folks
I would like to create a forloop to scrap (with Rvest) H1 tag for a list of urls.
I tried to do it with the following code but it doesn't work.
Does somebody can help me ? :slightly_smiling_face:

Thanks !
library(rvest)
library(readr)
library(tidyverse)
library(XML)
library(httr)

#URLs list loading
urls <- c("Coronavirus : Actualités, vidéos, images et infos en direct - 20 Minutes")

#I create an emplty list
tbl <- list()

#I start forloop
for (i in 1:length(urls)) {
tbl[[i]] <- urls[[i]] %>% # tbl[[i]] assigns each H1 from urls as an element in the tbl list
read_html() %>%
html_nodes("h1") %>%
html_text() %>%
if (dim(tbl)[i] == 0){
i = i+1
}}
tbl

Error message in Console
Error in if (.) dim(tbl)[i] == 0 else { :
the argument cannot be interpreted as a logical value

Hi,

Your code looks confusing and I can't follow the process. Could you please provide us with a reprex? A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

Good luck,
PJ

Hello @pieterjanvc pieterjanvc
Thank you for your reply
I did some modifications to clarify my code.
Could you please tell me if you need anything else ?

thks
Pierre

There is no need to manually update the index (and it is also wrong syntax) if you remove this part your code works, but I would like to propose this solution instead of a for loop

library(tidyverse)
library(rvest)

urls <- c("https://www.20minutes.fr/dossier/coronavirus","https://www.20minutes.fr/economie/")

tbl <- map(urls, ~ {
    .x %>%
        read_html() %>%
        html_nodes("h1") %>%
        html_text()
})

tbl
#> [[1]]
#> [1] "Coronavirus"
#> 
#> [[2]]
#> [1] "Économie"

Created on 2020-03-13 by the reprex package (v0.3.0.9001)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.