Percent increase and diff regardless of the date order(in R)

I have a dataset, df, where I wish to calculate the percent increase of the sum of a particular group over a time period. Here is the dataset:

   date      size               type
   1/1/2020  3                   a
   1/1/2020  13                  b
   1/1/2020  1                   c
   2/1/2020  51                  a
   2/1/2019  10                  b

Desired output

Then find percent diff and diff from earliest date,

date       diff     percentdiff    type

2/1/2020   48       1600           a
1/1/2020   3        30             b
1/1/2020   0        0              c   

We see that group 'a' went from 3 to 51, (from 1/1/2020 to 2/1/2020) which gives us a difference of 48, and a percent difference of 1600%
Group c is 0 because there is no change.

Percent Increase/Change is  final-inital/initial * 100

This is what I have tried:

  df %>% 
group_by(date, type) %>% 
summarise(size = sum(size)) %>%
arrange(type, date) %>% 
group_by(type) %>% 
summarise(date = last(date),
          diff = last(size) - first(size),
          increase = scales::percent(diff / first(size)))

This works great, thanks to a member on this platform, however, what could I do if I have a case where I wish to take the diff from the earliest date to the latest. My output should give me +3 and not -3

    dput:
    structure(list(date = c("1/1/2020", "1/1/2002", "1/1/2020", "2/1/2020", 
    "2/1/2019"), size = c(3L, 13L, 1L, 51L, 10L), type = c("a", "b", 
   "c", "a", "b")), class = "data.frame", row.names = c(NA, -5L))

I am still researching this. Any suggestion is appreciated.

Hello,

Here is my implementation:

library(dplyr)

#Input data
myData = structure(list(
  date = c("1/1/2020", "1/1/2002", "1/1/2020", "2/1/2020", "2/1/2019"), 
  size = c(3L, 13L, 1L, 51L, 10L), 
  type = c("a", "b", "c", "a", "b")), 
  class = "data.frame", row.names = c(NA, -5L))

#Convert the date to mm/dd/yyyy date format
myData = myData %>% mutate(
  date = as.Date(date, format = "%m/%d/%y")
)

#Compare change from first to last
myData %>% group_by(type) %>% 
  summarise(diff = first(size) - last(size),
            percent = diff / last(size) * 100,
            .groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a       -48   -94.1
#> 2 b         3    30  
#> 3 c         0     0

#Compare change from last to first
myData %>% group_by(type) %>% 
  summarise(diff = last(size) - first(size),
            percent = diff / first(size) * 100,
            .groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a        48  1600  
#> 2 b        -3   -23.1
#> 3 c         0     0

#Compare absolute change with % increase between smallest and largest
myData %>% group_by(type) %>% 
  summarise(diff = abs(last(size) - first(size)),
            percent = diff / min(first(size), last(size)) * 100,
            .groups = "drop")
#> # A tibble: 3 x 3
#>   type   diff percent
#>   <chr> <int>   <dbl>
#> 1 a        48    1600
#> 2 b         3      30
#> 3 c         0       0

Created on 2020-11-05 by the reprex package (v0.3.0)

Hope this helps,
PJ

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.