# Aggregate and Group to find increase and diff

I have a dataset, df, where I wish to calculate the percent increase of the sum of a particular group over a time period. Here is the dataset:

``````     date     size       type

1/1/2020     1          a
1/1/2020     1          a
1/1/2020     1          a
1/1/2020     2          b
1/1/2020     5          b
1/1/2020     6          b
2/1/2020     20         a
2/1/2020     21         a
2/1/2020     10         a
2/1/2020     1          b
2/1/2020     4          b
2/1/2020     5          b
``````

Desired output

``````(grouping by type and date to find sum)

date        size    type

1/1/2020    3       a
2/1/2020    51      a
1/1/2020    13      b
2/1/2020    10      b
``````

(finding the increase and diff by type)

Final Desired Output:

`````` date        type      increase    diff

2/1/2020    a         1600%       48
2/1/2020    b        -23.07%      -3

Percent Increase/Change is  final-inital/initial * 100
``````

I was not sure if I had to break this into two separate parts so I added an intermediate desired output.

This is what I am doing:

``````  df %>% group_by(date, type) %>% mutate_each(funs(pct), c(date, type))
``````

dput:

`````` structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"1/1/2020", "1/1/2020", "2/1/2020", "2/1/2020", "2/1/2020", "2/1/2020",
"2/1/2020", "2/1/2020"), size = c(1L, 1L, 1L, 2L, 5L, 6L, 20L,
21L, 10L, 1L, 4L, 5L), type = c("a", "a", "a", "b", "b", "b",
"a", "a", "a", "b", "b", "b")), class = "data.frame", row.names = c(NA,
``````

-12L))

Any suggestion is appreciated.
I am still researching the topic.

``````library(dplyr)

df <-  structure(list(Date = c("1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020",
"1/1/2020", "1/1/2020", "2/1/2020", "2/1/2020", "2/1/2020", "2/1/2020",
"2/1/2020", "2/1/2020"), size = c(1L, 1L, 1L, 2L, 5L, 6L, 20L,
21L, 10L, 1L, 4L, 5L), type = c("a", "a", "a", "b", "b", "b",
"a", "a", "a", "b", "b", "b")), class = "data.frame", row.names = c(NA,
-12L))
df %>%
group_by(Date, type) %>%
summarise(size = sum(size)) %>%
arrange(type, Date) %>%
group_by(type) %>%
summarise(Date = last(Date),
diff = last(size) - first(size),
increase = scales::percent(diff / first(size)))
#> `summarise()` regrouping output by 'Date' (override with `.groups` argument)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 4
#>   type  Date      diff increase
#>   <chr> <chr>    <int> <chr>
#> 1 a     2/1/2020    48 1 600%
#> 2 b     2/1/2020    -3 -23%
``````

Created on 2020-11-04 by the reprex package (v0.3.0.9001)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.