substitute an innested for loop

Hi,
I've this piece of code

for (i in 1:nrow(data)) { 
  
    for (j in 1:nrow(mydata)){
      if(data[i,2]==mydata[j,2])
      {data[i,8]= mydata[j,3]
     }
    
  }
}

I've a very long database (data) where there are in the second column names of firms, which the same name is repeated a lot of times as they are time series, and i have to add in the 8th column the sector of the firm which is held in another database (mydata) that contains the names of the firms plus the sector, but in this case the names are not repeated as i don't have any time information.
The probelm is that i don't know for each firms how many times the name is repeated so i can't do anything standard and with this solution should be optimal, but the databases are very long, it's needing an eternity. Is there any faster solution (with a first database of 600k obseravation and almost 17k firms in the other have already passed 2 days without ending the loops and i have two more databases of 1 and 2 million)?
Thank you

what you describe sounds like a simple join.

library(tidyverse)

(data <- tibble(
  fact=c(1,2,3,4),
  firm=c("a","a","b","b")
))

(firmsec <- tibble(
  firm = c("a","b"),
  sector= c(123,99)
))

left_join(data,firmsec)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.