Help to scrape a html document

Good morning community,

Can someone help me to understand how to import in r the Table present in this html document.

"https://www.agerborsamerci.it/listino/listino.html"

Thanks a lot.

start like this

library(rvest)
library(tidyverse)
url <- "https://www.agerborsamerci.it/listino/web.htm"
htab <- url %>%
  read_html() %>%
  html_nodes(xpath="//table") %>%
  html_table(fill=TRUE)
1 Like

Thanks a lot for the reply @nirgrahamuk

I tried to implement more code to create a better visualization of the data, which you could reproduce by running the follow code

library(rvest)
library(tidyverse)
library(DT)

url <- "https://www.agerborsamerci.it/listino/web.htm"
htab <- url %>%
  read_html() %>%
  html_nodes(xpath="//table") %>%
  html_table(fill=TRUE)

htab_new<-as.data.frame(htab)

htab_new<-htab_new[,1:7]

htab_new<-htab_new[-1,]
htab_new<-htab_new[-1,]
htab_new<-htab_new[-1,]
htab_new<-htab_new[-1,]
htab_new<-htab_new[-1,]
#htab_new<-htab_new[-2,]

dates_one<-htab_new[1, 2]
dates_two<-htab_new[1, 4]

names(htab_new)<-c("Object", 
                   paste0(dates_one, " - Min"), paste0(dates_one, " - Max"), 
                   paste0(dates_two, " - Min"), paste0(dates_two, " - Max"),
                   "Difference min", "Difference max")

htab_new<-htab_new[-1,]
htab_new<-htab_new[-1,]

htab_new<-htab_new[-20,]
htab_new<-htab_new[-20,]
htab_new<-htab_new[-20,]
htab_new<-htab_new[-20,]

htab_new<-htab_new[-144,]
htab_new<-htab_new[-144,]
htab_new<-htab_new[-144,]
htab_new<-htab_new[-144,]
htab_new<-htab_new[-144,]

htab_new<-head(htab_new, -3)

datatable(htab_new)

but in some rows I notice that the content is repeated in multiple columns

Could you please tell me how to solve this problem, so that the content is only in one column instead of all columns.

Thank you very much, I appreciate it

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.