Percent difference and raw difference with consecutive but odd number of dates

I have a dataset, df, where I wish to find the percent diff and diff. I wish to look at the earliest date and compare this value to the next date:

 id    date         value

 1     11/01/2020   10
 2     11/01/2020   5
 1     10/01/2020   20
 2     10/01/2020   30
 1     09/01/2020   15
 2     09/01/2020   10
 3     11/01/2020   5

Desired output

  id    date          diff   percent


  1     10/01/2020    5       33                 
  1     11/01/2020   -10     -50
  2     10/01/2020    20      200               
  2     11/01/2020   -25   -83.33
  3     11/01/2020     0       0 

I am wanting to look at one group at a time and compare the previous value to the next value and find the percent increase and diff.

For example,

ID 1, from 09/01/2020 to 10/01/2020 : goes from 15 to 20,
giving a difference of 5
percent difference is 33%

from 10/01/2020 to 11/01/2020: goes from 20 to 10,
difference of -10 and a 50% percent difference.

This what I am doing:

   df %>% 
  group_by(date, id) %>% 
  summarise(diff = last(value) - first(value),
          increase = scales::percent(diff / first(value))) %>% 
  arrange(id, date)






 dput:
  structure(list(id = c(1L, 2L, 1L, 2L, 1L, 2L, 3L), date = c("11/1/2020", 
  "11/1/2020", "10/1/2020", "10/1/2020", "9/1/2020", "9/1/2020", 
 "11/1/2020"), value = c(10L, 5L, 20L, 30L, 15L, 10L, 5L)), class = "data.frame", 
 row.names = c(NA, 
 -7L))

Any suggestion is appreciated

There's a few things to change, you want differences within each id not within each id and date. I chose to use the lag function which gets the previous value. Finally, you need to convert the date to be a date and not a character so the sorting works correctly. Otherwise, September (9) will come after October (10).

library(tidyverse)

df <- structure(list(id = c(1L, 2L, 1L, 2L, 1L, 2L, 3L), 
                     date = c("11/1/2020", "11/1/2020", "10/1/2020", "10/1/2020", "9/1/2020", "9/1/2020", "11/1/2020"), 
                     value = c(10L, 5L, 20L, 30L, 15L, 10L, 5L)), 
                class = "data.frame", 
                row.names = c(NA, -7L))

df %>% 
  mutate(date=lubridate::mdy(date)) %>% # need to convert this to be a date for sorting
  arrange(id, date) %>% #order by id and date
  group_by(id) %>% #find lag differences within each id
  mutate(
    diff=value-lag(value),
    increase=scales::percent(diff / lag(value))
  ) %>%
  filter(row_number()!=1) #remove first date of each id because it isn't needed
#> # A tibble: 4 x 5
#> # Groups:   id [2]
#>      id date       value  diff increase
#>   <int> <date>     <int> <int> <chr>   
#> 1     1 2020-10-01    20     5 33%     
#> 2     1 2020-11-01    10   -10 -50%    
#> 3     2 2020-10-01    30    20 200%    
#> 4     2 2020-11-01     5   -25 -83%

Created on 2020-11-04 by the reprex package (v0.3.0)

1 Like

Thank you Stephanie, this works! I will research more on the lag function!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.