I'm a little confused as to what the relationship is between DF1 and DF2 (note that I've created a reprex below, which makes it much easier for others to help you). With the exception of one date, it looks like the product and return_dates are combined as primary keys, but then you aggregate across them in the last table?
Do you have unique IDs for the transactions? It might help a bit.
suppressPackageStartupMessages(library(tidyverse))
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
df1 <- tibble::tribble(
~product, ~return_date, ~total_bookings,
"product_A", "2018-05-26", 4,
"product_A", "2019-01-24", 9,
"product_A", "2019-10-25", 5,
"product_B", "2018-09-18", 12,
"product_B", "2018-10-25", 30,
"product_B", "2019-03-16", 4
) %>%
mutate(return_date = parse_date(return_date, format = "%Y-%m-%d"))
df2 <- tibble::tribble(
~product, ~date_last_return, ~purchase_date, ~bookings,
"product_A", "2018-10-25", "2019-02-12", 2,
"product_A", "2019-01-24", "2019-06-14", 2,
"product_A", "2018-05-26", "2019-06-28", 2,
"product_B", "2018-09-18", "2018-11-01", 2,
"product_B", "2018-10-25", "2019-02-06", 1,
"product_B", "2019-03-16", "2019-06-13", 2
) %>%
mutate(date_last_return = parse_date(date_last_return, format = "%Y-%m-%d"),
purchase_date = parse_date(purchase_date, format = "%Y-%m-%d"))
df1 %>%
left_join(df2, by = c("product", c("return_date" = "date_last_return"))) %>%
mutate("diff_date" = purchase_date - return_date)
#> # A tibble: 6 x 6
#> product return_date total_bookings purchase_date bookings diff_date
#> <chr> <date> <dbl> <date> <dbl> <drtn>
#> 1 product_A 2018-05-26 4 2019-06-28 2 398 days
#> 2 product_A 2019-01-24 9 2019-06-14 2 141 days
#> 3 product_A 2019-10-25 5 NA NA NA days
#> 4 product_B 2018-09-18 12 2018-11-01 2 44 days
#> 5 product_B 2018-10-25 30 2019-02-06 1 104 days
#> 6 product_B 2019-03-16 4 2019-06-13 2 89 days
Created on 2019-07-30 by the reprex package (v0.3.0)