Compare two tables, keep all rows in the table 2 while individuals in table 1 appear in table 2 (appearing with repetition)

Hello everyone,

I have to match individuals in my first table to those in the second. Individuals in the first appear only onetime in this one but in the second they could appear many times. Also, in the second table there are some individuals who don't appear in the first one. And my goal is to delete the row of these individuals who doesn't appear in the first table, hence keep only those I have in the first one.
My two tables are so huge, I can't load them here, sorry.Could anyone help me please?

`(company_top10 <- tibble(Company_id = c(rand(1:2000)))``
(patent_portfolio <- tibble(Company_id = c(1,1,1, 2,2, ... , 2000), Patent_id = c ( rand ("A":"Z")))
I tried with this:
`for (i in company_top10$Company_id){
  patent_pf_top10 <- patent_portfolio %>% filter(Company_id == i)  

But the problem is tha, it remains only the last filter. Thank you for your answer.

you may not be able to load them in their entirety, but you could share representative data, you could simply invent example data that would conform to the description you wrote / your issue.

On its face, it seems like you want to do a simple inner join. keeping rows where individuals in both tables match, and disregarding the rest. here is an example of an inner join using tidyverse/dplyr


(df1 <- tibble(id = 1:3))
(df2 <- tibble(id = c(1, 1, 2,4), y = c("first", "second", "third","fourth")))

df1 %>% inner_join(df2)
1 Like

Yes you are right, i shall to tray to give an example.

Thank you but it gives an unxpected result. I repeate each row in n times in the first dataframe

perhaps you want something like this:


(df1 <- tibble(id = 1:3))
(df2 <- tibble(id = c(1, 1, 2,4), y = c("first", "second", "third","fourth")))

(unique_ids_in_both <- df1 %>% inner_join(df2) %>% distinct(id))

df1 %>% inner_join(unique_ids_in_both)
1 Like

Thank you so much but it not that. But good news, I found it alon. I would like to share it here.

  patent_pf_top10 <- tibble()
  for (i in company_top10$Company_id){
    patent_pf_top10 %<>% rbind(patent_portfolio %>% filter(Company_id == i))

May be there is alrady a function who do that yet.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.