Newbie struggling with empty values in web scraping

Hello there,

I'm a bloody newbie in the field of programing and web scraping with R. I've watched easy tutorials and now starting to collect data from a german football manager site. I want to scrap three values (name, season and points). Points and seasons can occur mutliple times due to the fact that player normally play multiple seasons in the Bundesliga. Because of the URL structure having a "profile?id=" at the end, every player profile gets a number starting from 1 to 30k or so. Meaning every player in the Bundesliga after 2006 is somehow registered there . Hopefully

I am faching now two main problems:

  1. Not every number in this range is connected to a profile. In fact the gaps with empty profiles is random leading to an error called "player not found". The error message is not in the same html node as the desired values .

  2. Sometimes there are profiles linked to players name but with no season or point value

I would like to fill these empty files with an "NA" for both cases ideally. And move onto the next

Here is the code first 50 profiles (be gentle, its my first project :smiley: ):

library(dplyr) 
library(rvest)


 
 playerinf=data.frame()
  
   for(page_result in seq (from = 1, to = 50, by = 1)){
        link = paste0("https://stats.comunio.de/profile?id=",page_result)
      code = read_html(link)
Name = code %>% html_nodes("#content .bold")%>% html_text()

Season = code %>% html_nodes(".nopadding td:nth-child(1)")%>% html_text() 

Points = code %>% html_nodes(".nopadding td+ td")%>% html_text() 

playerinf=rbind(playerinf,data.frame(Name,Season,Points))

   }

Error message:

Error in data.frame(Name, Season, Points) : 
  Argumente implizieren unterschiedliche Anzahl Zeilen: 1, 0

Thanks for help in advance!

Solved by myself:

library(dplyr) 
library(rvest)

  
   
   playerinf=data.frame()
    
       for(page_result in seq (from = 1, to = 40000, by = 1)){
             link = paste0("https://stats.comunio.de/profile?id=",page_result)
            code = read_html(link) 
            Name = code %>% html_nodes("#content .bold")%>% html_text()
      Season = code %>% html_nodes(".nopadding td:nth-child(1)")%>% html_text()
      Position = code %>% html_nodes("td:nth-child(1) tr:nth-child(3) .left+ td")%>% html_text()
      Points = code %>% html_nodes(".nopadding td+ td")%>% html_text()
      
        playerinf=rbind(playerinf,data.frame(
            Name = ifelse(length(Name)==0,NA,Name),
            Season= ifelse(length(Season)==0,NA,Season),
           Position= ifelse(length(Position)==0,NA,Position),
          Points= ifelse(length(Points)==0,NA,Points)))
        
          write.csv(playerinf, "PlayerInfomartionComStat.csv")   
       }

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.