How to write NA for missing results in rvest if there was no content in node (within loop) further how to merge variable with results

Hi i'm new to R and try to fetch the tickers/symbols of Yahoo Finance from a text file which contains company names like Adidas, BMW etc. in order to run an event study later. This file contains about 800 names. Some of them can be found in yahoo and some not. (Thats ok)

My loop work so far but missing results won't be displayed. Further it only creates a table with numbers and results which could be found.But i would like to create a list which displayed the variable i ("firmen") and the results that's has been found or an NA in case there was no result.

Hope you guys can help me. Thank you !!!

my code:

library(rvest)

# company_names
firmen <- c(read.table("Mappe1.txt"))

# init
df <- NULL

# loop for search names in Yahoo Ticker Lookup
for(i in firmen){
  # find url
  url <- paste0("https://finance.yahoo.com/lookup/all?s=", i, "/")
  page <- read_html(url,as="text")

# grab table
  table <- page %>%
    html_nodes(xpath = "//*[@id='lookup-page']/section/div/div/div/div[1]/table/tbody/tr[1]/td[1]") %>%
    html_text() %>%
    as.data.frame()

# bind to dataframe
  df <- rbind(df, table)

}

prefer bind_rows from dplyr/tidyverse over rbind, because it takes less effort to construct suitable rows by their header to bind to existing tables.

You basically would need to add some logic to detect whether you would be binding a row that you got from your query, or a row that you construct to stand in for an absent row.

library(dplyr) # for bind_rows 

(x <- tail(iris))

(head_and_tail <- bind_rows(head(iris),
                            x ))
          
(x <- NULL)

(head_and_null_bad <- bind_rows(head(iris),
                                x ))

(head_and_null_good <- bind_rows(head(iris),
                            if(is.null(x))data.frame(Species="unkown") else x ) )
1 Like

Indeed, identify the issue with if and address it , like by adding rows to represent the issues.

If you would like to learn R syntax, a combination of swirl package and r4ds online book will set you up.

1 Like

I solved the first problem and now empty nodes (if "i" has not been found on the yahoo page) will be displayed as "NA"

here is the code:

  library(rvest)

# teams
firmen <- c(read.table("Mappe1.txt"))

# init
df <- NULL
table <- NULL

# loop
for(i in firmen){
  # find url
  url <- paste0("https://finance.yahoo.com/lookup/all?s=", i, "/")
  page <- read_html(url,as="text")
  # grab ticker from yahoo finance
  table <- page %>%
    html_nodes(xpath = "//*[@id='lookup-page']/section/div/div/div/div[1]/table/tbody/tr[1]/td[1]") %>%
    html_text(trim=TRUE) %>% replace(!nzchar(table), NA) %>%
    as.data.frame()
  
  # bind to dataframe

  df <- rbind(df,table)
}

Now there is just one question left

How can i merge "df" and "firmen" into one table which has the columns:

"tickers" = df and "firmen" = firmen

because df has just one column named "." with the results and the list firmen contains a number of companies placed in many colums but with just one row.

basically i need to transform the list "firmen" but i don't know how

Thank you for the help

Hey nirgrahamuk, thank you for your fast response.

Unfortunately I do not understand how I can apply the example to my situation, as I am not yet very familiar with the syntax of R. Could you perhaps explain to me how I can best apply the code you have shared to my example.

As I understood it, I need to add an if then statement which fills out empty rows with "unknown" if not empty then "result".

Thank you

Thank you for the recommendations.
Just installed Swirl and bought the book !

I tried to implement an If else statement into the loop. But i could not figure out how to add a row if the result of the node is empty. Do you have any further guide for me.

Thank you and best reg.
Sinatra