I have a data-frame that shows the most recent flight purchases of a product (denoted by purchase_date ) AND the total # of returned bookings of the time period before that denoted by (return_date).
I am trying to create a variable that shows the percentage of re-bookings based on the most recent purchases and the total most recent returns in a given time period.
INPUT
I have two data-frames:
DF1 - shows the total bookings of a product based on the product's return date.
product return_date total_bookings
<fct> <date> <dbl>
# product_A 2018-05-26 4
# product_A 2019-01-24 9
# product_A 2019-10-25 5
# product_B 2018-09-18 12
# product_B 2018-10-25 30
# product_B 2019-03-16 4
DF2 - shows the total most recent bookings of a product for those who recently returned from a trip (i.e. date_last_return). The idea is that it shows the number of people who purchased a product again since their last return.
product date_last_return purchase_date bookings
<fct> <date> <date> <dbl>
# product_A 2018-10-25 2019-02-12 2
# product_A 2019-01-24 2019-06-14 2
# product_A 2018-05-26 2019-06-28 2
# product_B 2018-09-18 2018-11-01 2
# product_B 2018-10-25 2019-02-06 1
# product_B 2019-03-16 2019-06-13 2
Desired Results
What I'm trying to do is to aggregate all the total "returned bookings" from DF1 based on if the return date was 6 months before the purchase_date.
For example:
product purchase_date re-bookings total_6month_return
1 product_A 2019-02-12 2 9
2 product_A 2019-06-14 2 9
3 product_A 2019-06-28 2 9
4 product_B 2018-11-01 2 42
5 product_B 2019-02-06 1 42
6 product_B 2019-06-13 2 4
I am having great difficultly creating a way to aggregate DF1 by the purchase_date in DF2 and can't seem to find a solution on how to accomplish this.
If you have any coding suggestions in R on how I could do this, I would greatly appreciate it. I'm thinking a for loop would be appropriate but I simply do not know