Growth Increase, Groupby over time in R

I have a dataset, df, where I am trying to calculate the percent increase of a particular group over a time period. Here is the dataset:

    date      size       type

   1/1/2020   1          a
   1/1/2020   1          a
   1/1/2020   3          a
   1/1/2020   1          b
   1/1/2020   2          b
   1/1/2020   0          b
   2/1/2020   5          a
   2/1/2020   6          a
   2/1/2020   3          a
   2/1/2020   20         b
   2/1/2020   21         b
   2/1/2020   30         b

Desired output

  date         increase   diff   type

  1/1/2020     200%       2      a
  1/1/2020     -40%      -2      a
  2/1/2020     -100      -1      b
  2/1/2020      50%      10      b



Percent Increase/Change is  final-inital/initial * 100

example for a , we start at 1 then end at 3, 
which is a 200% increase in the month of January

This is what I am doing:

df %>%
group_by(type, size) %>% 
mutate(
increase = (lead(size) - size),
percent_increase = ((lead(size) - size)/size) * 100,
begin = date,
end = lead(date)) %>% 
filter(!is.na(percent_increase)) %>% 
arrange(location)

However, my dates are not grouping correctly. Since I only have dates that are consecutive first of the month, I figure I can just use freq = '1D'

    date             

   1/1/2020            
   1/1/2020             
   1/1/2020             
   1/1/2020            

I suppose I would have to groupby the type as well as the date, but I am not sure and I am still researching this.

Any suggestion is appreciated.

dput:
structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"1/1/2020", "1/1/2020", "2/1/2020", "2/1/2020", "2/1/2020", "2/1/2020",
"2/1/2020", "2/1/2020"), size = c(1L, 1L, 3L, 1L, 2L, 0L, 5L,
6L, 3L, 20L, 21L, 30L), type = c("a", "a", "a", "b", "b", "b",
"a", "a", "a", "b", "b", "b")), class = "data.frame", row.names = c(NA,
-12L))

Can you make your initial dataset reproducible?

either use dput() or reprex(). e.g. paste the output of dput(df)

This is one way to do it

library(dplyr)

sample_df <- data.frame(
  stringsAsFactors = FALSE,
              date = c("1/1/2020","1/1/2020",
                       "1/1/2020","1/1/2020","1/1/2020","1/1/2020","2/1/2020",
                       "2/1/2020","2/1/2020","2/1/2020","2/1/2020","2/1/2020"),
              size = c(1, 1, 3, 1, 2, 0, 5, 6, 3, 20, 21, 30),
              type = c("a","a","a","b","b","b",
                       "a","a","a","b","b","b")
)

sample_df %>% 
    group_by(date, type) %>% 
    summarise(diff = last(size) - first(size),
              increase = scales::percent(diff / first(size))) %>% 
    arrange(type, date)
#> `summarise()` regrouping output by 'date' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups:   date [2]
#>   date     type   diff increase
#>   <chr>    <chr> <dbl> <chr>   
#> 1 1/1/2020 a         2 200%    
#> 2 2/1/2020 a        -2 -40%    
#> 3 1/1/2020 b        -1 -100%   
#> 4 2/1/2020 b        10 50%

Created on 2020-11-03 by the reprex package (v0.3.0.9001)

Note: Next time, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thank you. Updated post

Thank you. I have updated with dput. I will look into Reprex as well.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.