cj_risk
September 23, 2019, 5:00pm
1
I am doing some analysis on bank data, and I'm having difficulty tidying the data set. I also had difficulty putting this in a reprex, so I thought it was easiest to just show a small sample of the data. The data represents transactions that have start and end date that can vary from one month to multiple years. I need to transform the transaction data into a series of monthly transactions based on the transaction term. This looks to be gathering exercise, but I could not figure out to break the dates into monthly pieces.
The original data set is as follows:
I need to get the data into the following format:
Any suggestions on how to do this would be greatly appreciated.
Thanks
1 Like
Certainly is easiest for you, but it makes it harder to help you, could you at least share the sample data on a copy/paste friendly format? Have a look to this blog post by Mara that explains how to do it.
1 Like
cj_risk
September 24, 2019, 3:28am
4
My apologies. Here is the reprex with the data. As mentioned above, I am trying to break the data into monthly series for each transaction.
library(tidyverse)
library(lubridate)
df <- tibble(
transaction_id= c(1, 2),
trade_date= c(ymd(20190701, 20190801)),
start=c(ymd(20190801, 20191201)),
end=c(ymd(20191031, 20200228)),
price=c(5,6),
currency=c("CAD","CAD"),
volume=c(10000,5000))
I need to get the data into the following format:
Any suggestions would be appreciated.
Thanks
I think this is what you want
library(tidyverse)
library(lubridate)
df <- data.frame(stringsAsFactors=FALSE,
transaction_id = c("1", "2"),
trade_date = as.Date(c("2019-07-01", "2019-08-01")),
start = as.Date(c("2019-08-01", "2019-12-01")),
end = as.Date(c("2019-10-31", "2020-02-28")),
price = c(5, 6),
currency = c("CAD", "CAD"),
volume = c(10000, 5000)
)
# Sample data
df
#> transaction_id trade_date start end price currency volume
#> 1 1 2019-07-01 2019-08-01 2019-10-31 5 CAD 10000
#> 2 2 2019-08-01 2019-12-01 2020-02-28 6 CAD 5000
# Desired output
df %>%
group_by(transaction_id) %>%
mutate(start=list(seq(start, end, "months"))) %>%
unnest_longer(c(start)) %>%
mutate(end = ceiling_date(start, "month") - 1)
#> # A tibble: 6 x 7
#> # Groups: transaction_id [2]
#> transaction_id trade_date start end price currency volume
#> <chr> <date> <date> <date> <dbl> <chr> <dbl>
#> 1 1 2019-07-01 2019-08-01 2019-08-31 5 CAD 10000
#> 2 1 2019-07-01 2019-09-01 2019-09-30 5 CAD 10000
#> 3 1 2019-07-01 2019-10-01 2019-10-31 5 CAD 10000
#> 4 2 2019-08-01 2019-12-01 2019-12-31 6 CAD 5000
#> 5 2 2019-08-01 2020-01-01 2020-01-31 6 CAD 5000
#> 6 2 2019-08-01 2020-02-01 2020-02-29 6 CAD 5000
Created on 2019-09-24 by the reprex package (v0.3.0.9000)
1 Like
cj_risk
September 24, 2019, 3:57am
6
Thanks. That is the right output, but when I run the code I get the following error
Error in unnest_longer(., c(start)) : could not find function "unnest_longer"
Am I missing a package? Tidyverse is loaded.
Thanks
Also, a small modification to keep the last end date as 2020-02-28
library(tidyverse)
library(lubridate)
df %>%
group_by(transaction_id) %>%
mutate(start=list(seq(start, end, "months"))) %>%
unnest_longer(c(start)) %>%
rowwise() %>%
mutate(end = min(ceiling_date(start, "month") - 1, end)) %>%
ungroup()
#> # A tibble: 6 x 7
#> transaction_id trade_date start end price currency volume
#> <chr> <date> <date> <date> <dbl> <chr> <dbl>
#> 1 1 2019-07-01 2019-08-01 2019-08-31 5 CAD 10000
#> 2 1 2019-07-01 2019-09-01 2019-09-30 5 CAD 10000
#> 3 1 2019-07-01 2019-10-01 2019-10-31 5 CAD 10000
#> 4 2 2019-08-01 2019-12-01 2019-12-31 6 CAD 5000
#> 5 2 2019-08-01 2020-01-01 2020-01-31 6 CAD 5000
#> 6 2 2019-08-01 2020-02-01 2020-02-28 6 CAD 5000
1 Like
Thank. I didn't catch the day adjustment.
system
Closed
November 2, 2019, 8:32pm
10
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.