From the SO thread posted by Yarnabrina, you can also see a way base R allows preventing NA from matching itself. dplyr has something similar (detailed further below in this post). The match function has an incomparables = parameter, so we can do this:
match(NA, NA)
# [1] 1
match(NA, NA, incomparables = NA)
# [1] NA
The base function merge has the same argument:
d1 <- data.frame(x = c(1, NA), y = c("a", "b"))
d2 <- data.frame(x = c(1, NA), z = c("A", "B"))
merge(d1, d2)
# x y z
# 1 1 a A
# 2 NA b B
merge(d1, d2, incomparables = NA)
# x y z
# 1 1 a A
The problem is, incomparables isn't allowed for multi-column merges:
d1 <- data.frame(x1 = c(1, NA), x2 = c(TRUE, NA), y = c("a", "b"))
d2 <- data.frame(x1 = c(1, NA), x2 = c(TRUE, NA), z = c("A", "B"))
merge(d1, d2, incomparables = NA)
# Error in merge.data.frame(d1, d2, incomparables = NA) :
# 'incomparables' is supported only for merging on a single column
For dplyr, the *_join functions can take the na_matches = parameter. From the join.tbl_df documentation:
na_matches
Use "never" to always treat two NA or NaN values as different, like joins for database sources, similarly to merge(incomparables = FALSE). The default, "na", always treats two NA or NaN values as equal, like merge(). Users and package authors can change the default behavior by calling pkgconfig::set_config("dplyr::na_matches" = "never").