 # Percent increase and diff regardless of the date order(in R)

I have a dataset, df, where I wish to calculate the percent increase of the sum of a particular group over a time period. Here is the dataset:

``````   date      size               type
1/1/2020  3                   a
1/1/2020  13                  b
1/1/2020  1                   c
2/1/2020  51                  a
2/1/2019  10                  b
``````

Desired output

Then find percent diff and diff from earliest date,

``````date       diff     percentdiff    type

2/1/2020   48       1600           a
1/1/2020   3        30             b
1/1/2020   0        0              c
``````

We see that group 'a' went from 3 to 51, (from 1/1/2020 to 2/1/2020) which gives us a difference of 48, and a percent difference of 1600%
Group c is 0 because there is no change.

``````Percent Increase/Change is  final-inital/initial * 100
``````

This is what I have tried:

``````  df %>%
group_by(date, type) %>%
summarise(size = sum(size)) %>%
arrange(type, date) %>%
group_by(type) %>%
summarise(date = last(date),
diff = last(size) - first(size),
increase = scales::percent(diff / first(size)))
``````

This works great, thanks to a member on this platform, however, what could I do if I have a case where I wish to take the diff from the earliest date to the latest. My output should give me +3 and not -3

``````    dput:
structure(list(date = c("1/1/2020", "1/1/2002", "1/1/2020", "2/1/2020",
"2/1/2019"), size = c(3L, 13L, 1L, 51L, 10L), type = c("a", "b",
"c", "a", "b")), class = "data.frame", row.names = c(NA, -5L))
``````

I am still researching this. Any suggestion is appreciated.

Hello,

Here is my implementation:

``````library(dplyr)

#Input data
myData = structure(list(
date = c("1/1/2020", "1/1/2002", "1/1/2020", "2/1/2020", "2/1/2019"),
size = c(3L, 13L, 1L, 51L, 10L),
type = c("a", "b", "c", "a", "b")),
class = "data.frame", row.names = c(NA, -5L))

#Convert the date to mm/dd/yyyy date format
myData = myData %>% mutate(
date = as.Date(date, format = "%m/%d/%y")
)

#Compare change from first to last
myData %>% group_by(type) %>%
summarise(diff = first(size) - last(size),
percent = diff / last(size) * 100,
.groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a       -48   -94.1
#> 2 b         3    30
#> 3 c         0     0

#Compare change from last to first
myData %>% group_by(type) %>%
summarise(diff = last(size) - first(size),
percent = diff / first(size) * 100,
.groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a        48  1600
#> 2 b        -3   -23.1
#> 3 c         0     0

#Compare absolute change with % increase between smallest and largest
myData %>% group_by(type) %>%
summarise(diff = abs(last(size) - first(size)),
percent = diff / min(first(size), last(size)) * 100,
.groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a        48    1600
#> 2 b         3      30
#> 3 c         0       0
``````

Created on 2020-11-05 by the reprex package (v0.3.0)

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.