How to track changes across columns

I cannot share the data, but the principle of what I want to achieve is easy - across several columns, I want to filter out the rows that do not change (from row to row, but only when non of the columns change). Overall, I want to track changes in the rows, if this makes it more simple.

What would be the best way to filter the dataset that way?

Thank you in advance.

Here is one possible solution, though it isn't very elegant.

library(dplyr)

DF <- data.frame(A=c(1,1,1,1,1),B=c(1,3,1,1,1),C=c(1,1,1,1,2))
#Row 4 should be removed
DF
#>   A B C
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 2
lagged <- rbind(c(NA,NA,NA),DF[-nrow(DF),])
colnames(lagged) <- paste("L",colnames(lagged),sep="_")
lagged                
#>   L_A L_B L_C
#> 1  NA  NA  NA
#> 2   1   1   1
#> 3   1   3   1
#> 4   1   1   1
#> 5   1   1   1
Combinded <- cbind(DF,lagged)
Combinded
#>   A B C L_A L_B L_C
#> 1 1 1 1  NA  NA  NA
#> 2 1 3 1   1   1   1
#> 3 1 1 1   1   3   1
#> 4 1 1 1   1   1   1
#> 5 1 1 2   1   1   1
Combinded <- Combinded |> rowwise() |> 
  mutate(Flag = all(c_across(A:C) == c_across(L_A:L_C))) 

DF <- Combinded |> filter(is.na(Flag) | !Flag) |> 
  select(A:C)
DF
#> # A tibble: 4 x 3
#> # Rowwise: 
#>       A     B     C
#>   <dbl> <dbl> <dbl>
#> 1     1     1     1
#> 2     1     3     1
#> 3     1     1     1
#> 4     1     1     2

Created on 2022-08-06 by the reprex package (v2.0.1)

Thank you so much for taking the time to go through this! Do you think I can somehow replace lagging step with diff() function?

library(dplyr)

DF <- data.frame(A=c(1,1,1,1,1),B=c(1,3,1,1,1),C=c(1,1,1,1,2))
DF
#>   A B C
#> 1 1 1 1
#> 2 1 3 1
#> 3 1 1 1
#> 4 1 1 1
#> 5 1 1 2
DF <- DF |> mutate(across(.cols = everything(),
                    .fns = function(V) c(NA,diff(V)),
                    .names = "L_{.col}"))                                        
DF
#>   A B C L_A L_B L_C
#> 1 1 1 1  NA  NA  NA
#> 2 1 3 1   0   2   0
#> 3 1 1 1   0  -2   0
#> 4 1 1 1   0   0   0
#> 5 1 1 2   0   0   1
DF <- DF|> rowwise() |> 
  mutate(Flag = all(c_across(L_A:L_C)==0)) 
DF
#> # A tibble: 5 x 7
#> # Rowwise: 
#>       A     B     C   L_A   L_B   L_C Flag 
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1     1     1     1    NA    NA    NA NA   
#> 2     1     3     1     0     2     0 FALSE
#> 3     1     1     1     0    -2     0 FALSE
#> 4     1     1     1     0     0     0 TRUE 
#> 5     1     1     2     0     0     1 FALSE
DF <- DF |> filter(is.na(Flag) | !Flag) |> 
  select(A:C)
DF
#> # A tibble: 4 x 3
#> # Rowwise: 
#>       A     B     C
#>   <dbl> <dbl> <dbl>
#> 1     1     1     1
#> 2     1     3     1
#> 3     1     1     1
#> 4     1     1     2

Created on 2022-08-07 by the reprex package (v2.0.1)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.