Add column by ID into another df with duplicate IDs

I would like to join the column Product from my original dataset into another df with tokenised words based on Complaint.ID. I am not sure how to go about it because there are duplicate IDs in df2.

## Reprex
# original dataset
df1 <- tibble::tribble(
      ~Complaint.ID,        ~Product,
                  1,         "Credit card",
                  2,         "Student loan",
                  3,         "Mortgage",
                  4,         "Car Insurance",
                  5,         "Credit card",
                  6,         "Mortgage",
    
    )

# dataset of tokenised words
df2 <- tribble(
  ~Complaint.ID,          ~word,
            1,         "come",
            1,    "attention",
            1,         "citi",
            2,        "group",
            2,     "actively",
            3,   "attempting",
            3,    "interfere",
            3,       "rights",
            3,   "guaranteed",
            3, "constitution"
)

> df1
# A tibble: 6 x 2
  Complaint.ID Product      
         <dbl> <chr>        
1            1 Credit card  
2            2 Student loan 
3            3 Mortgage     
4            4 Car Insurance
5            5 Credit card  
6            6 Mortgage     
> df2
# A tibble: 10 x 2
   Complaint.ID word        
          <dbl> <chr>       
 1            1 come        
 2            1 attention   
 3            1 citi        
 4            2 group       
 5            2 actively    
 6            3 attempting  
 7            3 interfere   
 8            3 rights      
 9            3 guaranteed  
10            3 constitution

This should be what it looks like in the end:

> joined_df
# A tibble: 10 x 3
   Complaint.ID word         Product        
          <dbl> <chr>        <chr>
 1            1 come         Credit card
 2            1 attention    Credit card
 3            1 citi         Credit card
 4            2 group        Student loan
 5            2 actively     Student loan
 6            3 attempting   Mortgage
 7            3 interfere    Mortgage
 8            3 rights       Mortgage
 9            3 guaranteed   Mortgage
10            3 constitution Mortgage

I've tried out the different dplyr joins but they are not working for me. Thanks for any help!

I'm not sure what you tried but I'd suggest using left_join as follows:

library(tidyverse)

df1 <- tibble::tribble(
  ~Complaint.ID,        ~Product,
  1,         "Credit card",
  2,         "Student loan",
  3,         "Mortgage",
  4,         "Car Insurance",
  5,         "Credit card",
  6,         "Mortgage",
  
)

# dataset of tokenised words
df2 <- tribble(
  ~Complaint.ID,          ~word,
  1,         "come",
  1,    "attention",
  1,         "citi",
  2,        "group",
  2,     "actively",
  3,   "attempting",
  3,    "interfere",
  3,       "rights",
  3,   "guaranteed",
  3, "constitution"
)

joined_df <- df2 %>% left_join(df1, by="Complaint.ID")
joined_df
#> # A tibble: 10 x 3
#>    Complaint.ID word         Product     
#>           <dbl> <chr>        <chr>       
#>  1            1 come         Credit card 
#>  2            1 attention    Credit card 
#>  3            1 citi         Credit card 
#>  4            2 group        Student loan
#>  5            2 actively     Student loan
#>  6            3 attempting   Mortgage    
#>  7            3 interfere    Mortgage    
#>  8            3 rights       Mortgage    
#>  9            3 guaranteed   Mortgage    
#> 10            3 constitution Mortgage

Created on 2020-12-15 by the reprex package (v0.3.0)

Somehow I completely overlooked this. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.