jak213,
It's a bit tricky to write a good mutate() statement without knowing what the actual data might be like.
However, if you do the full join I suggested above, you would get:
v1 v2 rule
1 a b COE
2 b d COE
3 a e COE
4 c e <NA>
5 g d <NA>
You see the "NA" under rule because those rows were in df2, but not df1, and df1 does not have a "rule" column.
If the data were to be like this you could replace the NA with a 0, in a variety of ways. For example:
mutate(rule = ifelse(is.na(rule), "0", rule)
You could make a similar mutate() statement to generate your lead variable.
If I were doing this (but this is biased toward the work I do) I would add a marker to each dataframe, then join them. Something like this:
df1 <- tibble(v1 = c("a", "b", "a"),
v2 = c("b", "d", "e"),
rule = c("COE", "COE", "COE"),
df1 = (TRUE))
df2 <- tibble(v1 = c("a", "c", "g"),
v2 = c("b", "e", "d"),
df2 = TRUE)
df <- full_join(df1,
df2,
by = c("v1","v2"))
(I'm also using tibble() to create the dataframe/tibble, since again, this is what I would do.)
Anyway, you get this:
v1 v2 rule df1 df2
<chr> <chr> <chr> <lgl> <lgl>
1 a b COE TRUE TRUE
2 b d COE TRUE NA
3 a e COE TRUE NA
4 c e NA NA TRUE
5 g d NA NA TRUE
Then you can write your mutate functions along the lines of:
case_when(df1 == TRUE & df2 == TRUE ~ "Result you want when v1 and v2 combination is in both dataframes",
df1 == TRUE & df2 == FALSE ~ "Result you want when v1 and v2 combination is ONLY in df1"
...
I hope that helps.
Luke