Tidying Transaction Data

I am doing some analysis on bank data, and I'm having difficulty tidying the data set. I also had difficulty putting this in a reprex, so I thought it was easiest to just show a small sample of the data. The data represents transactions that have start and end date that can vary from one month to multiple years. I need to transform the transaction data into a series of monthly transactions based on the transaction term. This looks to be gathering exercise, but I could not figure out to break the dates into monthly pieces.

The original data set is as follows:

I need to get the data into the following format:

Any suggestions on how to do this would be greatly appreciated.

Thanks

1 Like

Certainly is easiest for you, but it makes it harder to help you, could you at least share the sample data on a copy/paste friendly format? Have a look to this blog post by Mara that explains how to do it.

1 Like

My apologies. Here is the reprex with the data. As mentioned above, I am trying to break the data into monthly series for each transaction.


library(tidyverse)
library(lubridate)

df <- tibble(
  transaction_id= c(1, 2),
  trade_date= c(ymd(20190701, 20190801)),
  start=c(ymd(20190801, 20191201)),
  end=c(ymd(20191031, 20200228)),
  price=c(5,6),
  currency=c("CAD","CAD"),
  volume=c(10000,5000))

I need to get the data into the following format:

Any suggestions would be appreciated.

Thanks

I think this is what you want

library(tidyverse)
library(lubridate)

df <- data.frame(stringsAsFactors=FALSE,
   transaction_id = c("1", "2"),
       trade_date = as.Date(c("2019-07-01", "2019-08-01")),
            start = as.Date(c("2019-08-01", "2019-12-01")),
              end = as.Date(c("2019-10-31", "2020-02-28")),
            price = c(5, 6),
         currency = c("CAD", "CAD"),
           volume = c(10000, 5000)
)

# Sample data
df
#>   transaction_id trade_date      start        end price currency volume
#> 1              1 2019-07-01 2019-08-01 2019-10-31     5      CAD  10000
#> 2              2 2019-08-01 2019-12-01 2020-02-28     6      CAD   5000

# Desired output
df %>% 
    group_by(transaction_id) %>% 
    mutate(start=list(seq(start, end, "months"))) %>% 
    unnest_longer(c(start)) %>% 
    mutate(end = ceiling_date(start, "month") - 1)
#> # A tibble: 6 x 7
#> # Groups:   transaction_id [2]
#>   transaction_id trade_date start      end        price currency volume
#>   <chr>          <date>     <date>     <date>     <dbl> <chr>     <dbl>
#> 1 1              2019-07-01 2019-08-01 2019-08-31     5 CAD       10000
#> 2 1              2019-07-01 2019-09-01 2019-09-30     5 CAD       10000
#> 3 1              2019-07-01 2019-10-01 2019-10-31     5 CAD       10000
#> 4 2              2019-08-01 2019-12-01 2019-12-31     6 CAD        5000
#> 5 2              2019-08-01 2020-01-01 2020-01-31     6 CAD        5000
#> 6 2              2019-08-01 2020-02-01 2020-02-29     6 CAD        5000

Created on 2019-09-24 by the reprex package (v0.3.0.9000)

1 Like

Thanks. That is the right output, but when I run the code I get the following error

Error in unnest_longer(., c(start)) : could not find function "unnest_longer"

Am I missing a package? Tidyverse is loaded.

Thanks

You have to update tidyr

2 Likes

Also, a small modification to keep the last end date as 2020-02-28

library(tidyverse)
library(lubridate)

df %>% 
    group_by(transaction_id) %>% 
    mutate(start=list(seq(start, end, "months"))) %>% 
    unnest_longer(c(start)) %>%
    rowwise() %>% 
    mutate(end = min(ceiling_date(start, "month") - 1, end)) %>% 
    ungroup()
#> # A tibble: 6 x 7
#>   transaction_id trade_date start      end        price currency volume
#>   <chr>          <date>     <date>     <date>     <dbl> <chr>     <dbl>
#> 1 1              2019-07-01 2019-08-01 2019-08-31     5 CAD       10000
#> 2 1              2019-07-01 2019-09-01 2019-09-30     5 CAD       10000
#> 3 1              2019-07-01 2019-10-01 2019-10-31     5 CAD       10000
#> 4 2              2019-08-01 2019-12-01 2019-12-31     6 CAD        5000
#> 5 2              2019-08-01 2020-01-01 2020-01-31     6 CAD        5000
#> 6 2              2019-08-01 2020-02-01 2020-02-28     6 CAD        5000
1 Like

Thank. I didn't catch the day adjustment.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.