I have a dataset, df, where I wish to calculate the percent increase of the sum of a particular group over a time period. Here is the dataset:
date size type
1/1/2020 3 a
1/1/2020 13 b
1/1/2020 1 c
2/1/2020 51 a
2/1/2019 10 b
Desired output
Then find percent diff and diff from earliest date,
date diff percentdiff type
2/1/2020 48 1600 a
1/1/2020 3 30 b
1/1/2020 0 0 c
We see that group 'a' went from 3 to 51, (from 1/1/2020 to 2/1/2020) which gives us a difference of 48, and a percent difference of 1600%
Group c is 0 because there is no change.
Percent Increase/Change is final-inital/initial * 100
This is what I have tried:
df %>%
group_by(date, type) %>%
summarise(size = sum(size)) %>%
arrange(type, date) %>%
group_by(type) %>%
summarise(date = last(date),
diff = last(size) - first(size),
increase = scales::percent(diff / first(size)))
This works great, thanks to a member on this platform, however, what could I do if I have a case where I wish to take the diff from the earliest date to the latest. My output should give me +3 and not -3
dput:
structure(list(date = c("1/1/2020", "1/1/2002", "1/1/2020", "2/1/2020",
"2/1/2019"), size = c(3L, 13L, 1L, 51L, 10L), type = c("a", "b",
"c", "a", "b")), class = "data.frame", row.names = c(NA, -5L))
I am still researching this. Any suggestion is appreciated.