dplyr left_join missing values

I would like to fill out the missing values in df1 using df2:

df1 <- data.frame(
  ID = c("1", "2", "3", "4"),
  value = c("A", "B", NA, NA) 
)

df2 <- data.frame(
  ID = c("3", "4"),
  value = c("C", "D") 
)

I would like to have:

  ID value 
  1       A    
  2       B    
  3       C       
  4       D       

Using left_join() from the dplyr package produces:

left_join(df1, df2, by=c("ID"))

  ID value.x value.y
  1       A    <NA>
  2       B    <NA>
  3    <NA>       C
  4    <NA>       D

What is the correct dplyr method to perform this operation?

Use rows_update()

library(dplyr)

df1 <- data.frame(
    ID = c("1", "2", "3", "4"),
    value = c("A", "B", NA, NA) 
)

df2 <- data.frame(
    ID = c("3", "4"),
    value = c("C", "D") 
)

df1 %>% 
    rows_update(df2, by = "ID")
#>   ID value
#> 1  1     A
#> 2  2     B
#> 3  3     C
#> 4  4     D

Created on 2020-10-13 by the reprex package (v0.3.0)

3 Likes

Great. Thanks a million!

@andresrcs you’re awesome. Do remember that

I noticed that rows_update does not work with duplicate values:

df1 <- data.frame(
  ID = c("1", "2", "2", "3", "4"),
  value = c("A", "B", "B", NA, NA) 
)

df2 <- data.frame(
  ID = c("3", "4"),
  value = c("C", "D") 
)

df1 %>% 
  rows_update(df2, by = "ID")

Fehler: `x` key values are not unique.

I now use a left_join and collapse both columns:

df1 %>% 
  left_join(df2, by="ID") %>%
  mutate(
    value=if_else(is.na(value.x), value.y, value.x)
    ) %>%
  select(ID, value)

  ID value
1  1     A
2  2     B
3  2     B
4  3     C
5  4     D

Is there any better solution to this?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.