It's easier to figure out what's going on, if you supply a reproducible example. The following could be an example of such:
set.seed(498729)
can_semi_join <- function(df1, df2, by, result_colname) {
bind_rows(
semi_join(df1, df2, by) %>% mutate(!!result_colname := TRUE),
anti_join(df1, df2, by) %>% mutate(!!result_colname := FALSE)
)
}
X = tibble(v = sample(LETTERS, 10))
Y = tibble(v = sample(LETTERS, 10))
can_semi_join(X, Y, "v", "res")
# A tibble: 10 x 2
v res
<chr> <lgl>
1 W TRUE
2 D TRUE
3 B FALSE
4 X FALSE
5 T FALSE
6 Y FALSE
7 J FALSE
8 C FALSE
9 U FALSE
10 P FALSE
You can get the same results like so, which also prevents shuffling of your rows:
X %>% mutate(res = v %in% Y$v)
# A tibble: 10 x 2
v res
<chr> <lgl>
1 B FALSE
2 X FALSE
3 T FALSE
4 W TRUE
5 D TRUE
6 Y FALSE
7 J FALSE
8 C FALSE
9 U FALSE
10 P FALSE
Which is equivalent to the following, if you want to avoid the df$var
notation
X %>% mutate(res = v %in% (Y %>% pull(v)))
...and what you seem to be looking for is the intersection, which exists as the function intersect()
in base
. But that will return the elements, rather than assign TRUE
or FALSE
to each element, which is what you're looking for - Hope it's helpful 