Merging data frames which might contain the same IDs and values

Hi,
Im trying to create a data frame using 6 other excel worksheets. The problem is that Im trying to match different values from different sheets to one ID and if no value or ID was present in one of the sheets the value for that sheet should be NA/NULL/0 doesnt matter. I dont know how to do that.
I have tried numerous functions and options but I have been always getting stuck on an error that said that length of data frames is different.
My data looks like this:

Excel_sheet_1

> ID1     value1
> ID3     value3
> ID4     value4
Excel_sheet_2

> ID1      value1(might be different that the one in Excel_sheet_1)
> ID2      value2(might be different that the one in Excel_sheet_1)
> ID70     value70
> ID1902   value1902

I want it to look like this:

Final_table

> ID1     value_excel1     value_excel2
> ID3     NA               value_excel2
> ID3     value_excel1     NA               
> ID4     value_excel1     NA               
> ID70    NA               value_excel2
> ID1902  NA               value_excel2

Hi @Reizdos, it would be helpful to see a small sample of the data you used, along with code you tried, but sounds like you want to use the full_join() command from the dplyr package. Here's a possible solution:

library(tidyverse) # contains dplyr package and more

# 'import' excel tables
excel1 <- 
  tibble(
    id = sample(1:10, 8) %>% sort(),
    value = sample(letters, 8)
  )

excel2 <- 
  tibble(
    id = sample(1:10, 4) %>% sort(),
    value = sample(letters, 4)
  )

# inspect
excel1
#> # A tibble: 8 x 2
#>      id value
#>   <int> <chr>
#> 1     1 n    
#> 2     2 e    
#> 3     3 a    
#> 4     4 g    
#> 5     5 m    
#> 6     7 o    
#> 7     8 k    
#> 8     9 p
excel2
#> # A tibble: 4 x 2
#>      id value
#>   <int> <chr>
#> 1     3 b    
#> 2     5 r    
#> 3     7 j    
#> 4     8 g

# rename value columns to match table names
excel1 <- 
  excel1 %>% rename(value1 = value)

excel2 <- 
  excel2 %>% rename(value2 = value)

# match, including all ids, with NA's for missing values
excel1 %>% full_join(excel2) %>% 
  # sort by id
  arrange(id)
#> Joining, by = "id"
#> # A tibble: 8 x 3
#>      id value1 value2
#>   <int> <chr>  <chr> 
#> 1     1 n      <NA>  
#> 2     2 e      <NA>  
#> 3     3 a      b     
#> 4     4 g      <NA>  
#> 5     5 m      r     
#> 6     7 o      j     
#> 7     8 k      g     
#> 8     9 p      <NA>

Created on 2020-03-09 by the reprex package (v0.3.0)

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.