R data.table: R: Aggregating values from one Date-Time period based on the relation to another Date-Time period

I have a data-frame that shows the most recent flight purchases of a product (denoted by purchase_date ) AND the total # of returned bookings of the time period before that denoted by (return_date).

I am trying to create a variable that shows the percentage of re-bookings based on the most recent purchases and the total most recent returns in a given time period.

INPUT

I have two data-frames:

DF1 - shows the total bookings of a product based on the product's return date.

product      return_date  total_bookings
  <fct>        <date>     <dbl>
# product_A   2018-05-26     4
# product_A   2019-01-24     9
# product_A   2019-10-25     5
# product_B   2018-09-18    12
# product_B   2018-10-25    30
# product_B   2019-03-16     4

DF2 - shows the total most recent bookings of a product for those who recently returned from a trip (i.e. date_last_return). The idea is that it shows the number of people who purchased a product again since their last return.

product          date_last_return  purchase_date   bookings
  <fct>              <date>           <date>       <dbl>
# product_A        2018-10-25       2019-02-12       2
# product_A        2019-01-24       2019-06-14       2
# product_A        2018-05-26       2019-06-28       2
# product_B        2018-09-18       2018-11-01       2
# product_B        2018-10-25       2019-02-06       1
# product_B        2019-03-16       2019-06-13       2

Desired Results

What I'm trying to do is to aggregate all the total "returned bookings" from DF1 based on if the return date was 6 months before the purchase_date.

For example:

product      purchase_date  re-bookings  total_6month_return

1 product_A   2019-02-12     2                 9               
2 product_A   2019-06-14     2                 9
3 product_A   2019-06-28     2                 9
4 product_B   2018-11-01     2                42
5 product_B   2019-02-06     1                42
6 product_B   2019-06-13     2                 4

I am having great difficultly creating a way to aggregate DF1 by the purchase_date in DF2 and can't seem to find a solution on how to accomplish this.

If you have any coding suggestions in R on how I could do this, I would greatly appreciate it. I'm thinking a for loop would be appropriate but I simply do not know

I'm a little confused as to what the relationship is between DF1 and DF2 (note that I've created a reprex below, which makes it much easier for others to help you). With the exception of one date, it looks like the product and return_dates are combined as primary keys, but then you aggregate across them in the last table?

Do you have unique IDs for the transactions? It might help a bit.

suppressPackageStartupMessages(library(tidyverse))
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df1 <- tibble::tribble(
       ~product, ~return_date, ~total_bookings,
    "product_A", "2018-05-26",               4,
    "product_A", "2019-01-24",               9,
    "product_A", "2019-10-25",               5,
    "product_B", "2018-09-18",              12,
    "product_B", "2018-10-25",              30,
    "product_B", "2019-03-16",               4
    ) %>%
  mutate(return_date = parse_date(return_date, format = "%Y-%m-%d"))


df2 <- tibble::tribble(
     ~product, ~date_last_return, ~purchase_date, ~bookings,
  "product_A",      "2018-10-25",   "2019-02-12",         2,
  "product_A",      "2019-01-24",   "2019-06-14",         2,
  "product_A",      "2018-05-26",   "2019-06-28",         2,
  "product_B",      "2018-09-18",   "2018-11-01",         2,
  "product_B",      "2018-10-25",   "2019-02-06",         1,
  "product_B",      "2019-03-16",   "2019-06-13",         2
  ) %>%
  mutate(date_last_return = parse_date(date_last_return, format = "%Y-%m-%d"),
         purchase_date = parse_date(purchase_date, format = "%Y-%m-%d"))

df1 %>%
  left_join(df2, by = c("product", c("return_date" = "date_last_return"))) %>%
  mutate("diff_date" = purchase_date - return_date)
#> # A tibble: 6 x 6
#>   product   return_date total_bookings purchase_date bookings diff_date
#>   <chr>     <date>               <dbl> <date>           <dbl> <drtn>   
#> 1 product_A 2018-05-26               4 2019-06-28           2 398 days 
#> 2 product_A 2019-01-24               9 2019-06-14           2 141 days 
#> 3 product_A 2019-10-25               5 NA                  NA  NA days 
#> 4 product_B 2018-09-18              12 2018-11-01           2  44 days 
#> 5 product_B 2018-10-25              30 2019-02-06           1 104 days 
#> 6 product_B 2019-03-16               4 2019-06-13           2  89 days

Created on 2019-07-30 by the reprex package (v0.3.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.