Hi R Masters,
I have this task I cannot finish myself.
I have a small dummy df with some duplicated emails. Now, I would like to calculate number of days between the first and last response and flag records with more than 90 days. This it what I've done so far:
data.source <-
data.frame(stringsAsFactors=FALSE,
EmailAddress = c("aaa", "aaa", "bbb", "bbb", "ccc", "ccc", "ccc", "ccc",
"ddd", "eee", "fff"),
CreatedOn = c("2019-09-05 16:41:28", "2019-12-10 10:40:44",
"2019-11-07 17:31:45", "2019-12-10 10:40:13", "2019-08-05 16:41:28",
"2019-12-10 10:40:44", "2019-11-07 17:31:45", "2019-12-10 10:40:13",
"2019-11-07 17:31:45", "2019-12-10 10:40:13", "2019-11-07 17:31:45")
)
data.source
library(tidyverse)
library(lubridate)
test <- data.source %>%
group_by(EmailAddress) %>%
arrange(CreatedOn)%>%
mutate(Sequence = row_number()) %>%
ungroup()
test
Now I think I should calculate days for records duplicates and for multiple duplicates, this gap should be between the first and the last response.
As a result I should get:
95.7 days for "aaa", 32.7 days for "bbb" and 126.7 day for "ccc".
"aaa" and "ccc" should get a flag they were duplicated beyond 90 days period.
Is this tack easy?