Hi R Masters,
I have this simple data frame:
source <- data.frame(
stringsAsFactors = FALSE,
URN = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg"),
VIN = c("xxx", "xxx", "yyy", "yyy", "yyy", "zzz", "abc"),
EventDate = c("2019-04-29","2019-11-04",
"2019-06-18","2019-11-21","2020-11-18","2020-01-27",
"2020-08-22"),
Q1 = c(10, 5, 8, 10, 2, 4, 3),
Q2 = c(1, 2, 1, 1, 2, 1, 2),
Q3 = c(1, 4, 3, 2, 1, 2, 4),
Q4 = c(2019, 2020, 2020, 2019, 2020, 2021, 2021),
Sequence = c(1, 2, 1, 2, 3, 0, 0)
)
source
and I am trying to aggregate the data to get another data frame where:
- We have a list only VINs with Sequence 1 or 2 (so VINs repeated twice)
- Numerical values (but lower than 100 to exclude Q4) for these VINs should be differences between the first and the second response (based on EventDate). All other variables can be ignored.
I need to get this:
data.frame(
stringsAsFactors = FALSE,
VIN = c("xxx", "yyy"),
Q1 = c(-5, 2),
Q2 = c(1, 0),
Q3 = c(3, -1)
)
Sequence variable was created by me manually but ideally my code should be able to find duplicates based on EventDate (0- No duplicate, 1- First duplicate, 2- Second Duplicate, 3-Third duplicate).
Is this easy to do?