Calculating difference between row entries using conditional lag

Aarun · January 4, 2021, 7:33am

Hi, I'm a beginner in R and need some help in calculating difference between row entries using a condition from another column. In the below dataset i need to create a column with difference in column A with the condition that the difference in A should be calculated with the previous entry in which the corresponding C is a non zero and grouped by ID.

The expected output would look like as in Column X, any help is highly appreciated.
I apologize if the format of question or readability is not clear.

ID	A	C	X
1	0	12	0
1	2	0	2
1	4	0	4
1	6	12	6
1	42	12	36
1	46	0	4
1	48	12	6
1	60	12	12
1	66	12	6
1	70	0	4
2	0	19	0
2	1.5	0	1.5
2	4.5	0	4.5
2	6	19	6
2	12	19	6
2	16	0	4
2	18	19	6
2	22	0	4
2	24	19	6

fauxneticien · January 4, 2021, 8:04am

Maybe there's a better solution, but my immediate thought would be to set up an intermediary variable (say, temp_A) that stores the value of the last value of A (where C != 0). In the code below, I used ifelse() to get the values of A when C != 0 (setting NA if C = 0), then used the na.locf function to carry forward this non-NA value until the next one, then used lag() to offset these values so that the current observation isn't included (e.g. for row 4, you want to subtract 0, not 6, which is the latest non-NA value). Hope that's understandable!

df <- tibble(
  ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
  A = c(0, 2, 4, 6, 42, 46, 48, 60, 66, 70, 0, 1.5, 4.5, 6, 12, 16, 18, 22, 24),
  C = c(12, 0, 0, 12, 12, 0, 12, 12, 12, 0, 19, 0, 0, 19, 19, 0, 19, 0, 19)
  )

df %>%
  group_by(ID) %>% 
  mutate(
    temp_A = ifelse(C != 0, A, NA) %>% zoo::na.locf() %>% lag(n = 1, default = 0),
    X      = A - temp_A,
  )

# A tibble: 19 x 5
# Groups:   ID [2]
      ID     A     C temp_A     X
   <dbl> <dbl> <dbl>  <dbl> <dbl>
 1     1   0      12      0   0  
 2     1   2       0      0   2  
 3     1   4       0      0   4  
 4     1   6      12      0   6  
 5     1  42      12      6  36  
 6     1  46       0     42   4  
 7     1  48      12     42   6  
 8     1  60      12     48  12  
 9     1  66      12     60   6  
10     1  70       0     66   4  
11     2   0      19      0   0  
12     2   1.5     0      0   1.5
13     2   4.5     0      0   4.5
14     2   6      19      0   6  
15     2  12      19      6   6  
16     2  16       0     12   4  
17     2  18      19     12   6  
18     2  22       0     18   4  
19     2  24      19     18   6

technocrat · January 4, 2021, 8:34am

I had the same thought until looking at the desired output, where X is never A_{{lag}_1}.

suppressPackageStartupMessages({
  library(dplyr)
})
# avoid built in names; it can lead to the function being given procedence over your object and an error refering to a 'closure'
DF <- tibble(
  ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
  A = c(0, 2, 4, 6, 42, 46, 48, 60, 66, 70, 0, 1.5, 4.5, 6, 12, 16, 18, 22, 24),
  C = c(12, 0, 0, 12, 12, 0, 12, 12, 12, 0, 19, 0, 0, 19, 19, 0, 19, 0, 19)
)


DF %>%
  group_by(ID) %>%
  mutate(X = ifelse(A < 1, 0, abs(C - A)))
#> # A tibble: 19 x 4
#> # Groups:   ID [2]
#>       ID     A     C     X
#>    <dbl> <dbl> <dbl> <dbl>
#>  1     1   0      12   0  
#>  2     1   2       0   2  
#>  3     1   4       0   4  
#>  4     1   6      12   6  
#>  5     1  42      12  30  
#>  6     1  46       0  46  
#>  7     1  48      12  36  
#>  8     1  60      12  48  
#>  9     1  66      12  54  
#> 10     1  70       0  70  
#> 11     2   0      19   0  
#> 12     2   1.5     0   1.5
#> 13     2   4.5     0   4.5
#> 14     2   6      19  13  
#> 15     2  12      19   7  
#> 16     2  16       0  16  
#> 17     2  18      19   1  
#> 18     2  22       0  22  
#> 19     2  24      19   5

^{Created on 2021-01-04 by the reprex package (v0.3.0.9001)}

Aarun · January 4, 2021, 9:49am

Thank You for the clear explanation. This approach gives the answer I'm looking for

Aarun · January 4, 2021, 9:58am

Thanks for trying out the question. The value of X is Alag with different lag (1,2 ,3,...) based on the conditions from C

system · January 11, 2021, 9:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.