# Growth Increase, Groupby over time in R

I have a dataset, df, where I am trying to calculate the percent increase of a particular group over a time period. Here is the dataset:

``````    date      size       type

1/1/2020   1          a
1/1/2020   1          a
1/1/2020   3          a
1/1/2020   1          b
1/1/2020   2          b
1/1/2020   0          b
2/1/2020   5          a
2/1/2020   6          a
2/1/2020   3          a
2/1/2020   20         b
2/1/2020   21         b
2/1/2020   30         b
``````

Desired output

``````  date         increase   diff   type

1/1/2020     200%       2      a
1/1/2020     -40%      -2      a
2/1/2020     -100      -1      b
2/1/2020      50%      10      b

Percent Increase/Change is  final-inital/initial * 100

example for a , we start at 1 then end at 3,
which is a 200% increase in the month of January
``````

This is what I am doing:

``````df %>%
group_by(type, size) %>%
mutate(
percent_increase = ((lead(size) - size)/size) * 100,
begin = date,
filter(!is.na(percent_increase)) %>%
arrange(location)
``````

However, my dates are not grouping correctly. Since I only have dates that are consecutive first of the month, I figure I can just use freq = '1D'

``````    date

1/1/2020
1/1/2020
1/1/2020
1/1/2020
``````

I suppose I would have to groupby the type as well as the date, but I am not sure and I am still researching this.

Any suggestion is appreciated.

dput:
structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"1/1/2020", "1/1/2020", "2/1/2020", "2/1/2020", "2/1/2020", "2/1/2020",
"2/1/2020", "2/1/2020"), size = c(1L, 1L, 3L, 1L, 2L, 0L, 5L,
6L, 3L, 20L, 21L, 30L), type = c("a", "a", "a", "b", "b", "b",
"a", "a", "a", "b", "b", "b")), class = "data.frame", row.names = c(NA,
-12L))

Can you make your initial dataset reproducible?

either use `dput()` or `reprex()`. e.g. paste the output of `dput(df)`

This is one way to do it

``````library(dplyr)

sample_df <- data.frame(
stringsAsFactors = FALSE,
date = c("1/1/2020","1/1/2020",
"1/1/2020","1/1/2020","1/1/2020","1/1/2020","2/1/2020",
"2/1/2020","2/1/2020","2/1/2020","2/1/2020","2/1/2020"),
size = c(1, 1, 3, 1, 2, 0, 5, 6, 3, 20, 21, 30),
type = c("a","a","a","b","b","b",
"a","a","a","b","b","b")
)

sample_df %>%
group_by(date, type) %>%
summarise(diff = last(size) - first(size),
increase = scales::percent(diff / first(size))) %>%
arrange(type, date)
#> `summarise()` regrouping output by 'date' (override with `.groups` argument)
#> # A tibble: 4 x 4
#> # Groups:   date [2]
#>   date     type   diff increase
#>   <chr>    <chr> <dbl> <chr>
#> 1 1/1/2020 a         2 200%
#> 2 2/1/2020 a        -2 -40%
#> 3 1/1/2020 b        -1 -100%
#> 4 2/1/2020 b        10 50%
``````

Created on 2020-11-03 by the reprex package (v0.3.0.9001)

Note: Next time, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

Thank you. Updated post

Thank you. I have updated with dput. I will look into Reprex as well.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.