I'm struggling with this type of join in R, and I'm not quite sure how to do it.
My data frame is like this:
all_data <- structure(list(pk1 = c("1", "1", "1"),
pk2 = c("101", "101", "101"),
pk3 = c(111111, 222222, 333333),
date_created = structure(c(18494, 18527, 18564), class = "Date")),
row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
What I'd like to do is try to join it on itself, but only join if there is a most recent transaction that exists.
Ideally my output would look like this:
pk1 pk2 pk3.x date_created.x pk3.y date_created.y
<chr> <chr> <dbl> <date> <dbl> <date>
1 1 101 111111 2020-08-20 NA NA
2 1 101 222222 2020-09-22 111111 2020-08-20
3 1 101 333333 2020-10-29 222222 2020-09-22
I struggle with this because I can't indicate a conditional join (i.e. join with these primary keys only if the date_created.x > date_created.y). I would appreciate a dplyr approach, although at this point, I feel that data.table might be the route I need to pursue.