Subtract rows by year

Hi,
I have a tibble that has values for several years for several indexes. Something like this:

suppressMessages(library(tidyverse))

df <- tribble(
  ~idx, ~year, ~value,
  "A", 2016, 10,
  "A", 2017, 15,
  "A", 2018, 12,
  "B", 2016, 21,
  "B", 2018, 15,
  "C", 2017, 15
)
df
#> # A tibble: 6 x 3
#>   idx    year value
#>   <chr> <dbl> <dbl>
#> 1 A      2016    10
#> 2 A      2017    15
#> 3 A      2018    12
#> 4 B      2016    21
#> 5 B      2018    15
#> 6 C      2017    15

I want to add a column with a "delta" to this tibble that is defined as the value minus the value of the previous year (for each index) or NA if delta can't be calculated. So, the desired output should look like this:

df_after <- tribble(
  ~idx, ~year, ~value, ~delta,
  "A", 2016, 10, NA,
  "A", 2017, 15, 5,
  "A", 2018, 12, -3,
  "B", 2016, 21, NA,
  "B", 2018, 15, NA,
  "C", 2017, 15, NA
)
df_after
#> # A tibble: 6 x 4
#>   idx    year value delta
#>   <chr> <dbl> <dbl> <dbl>
#> 1 A      2016    10    NA
#> 2 A      2017    15     5
#> 3 A      2018    12    -3
#> 4 B      2016    21    NA
#> 5 B      2018    15    NA
#> 6 C      2017    15    NA

I thought to group tibble by idx and use something like x - lag(x) but it doesn't work as some years are missed (like in the example above for idx "B"). Does anyone have an idea how to solve this problem?

This works but it might need to be adapted for real examples:

df %>% 
  group_by(idx) %>% 
  mutate(delta = ifelse(year - lag(year) == 1, value - lag(value), NA)) %>% 
  ungroup()
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.