Keeping duplicated rows across/between Ids

Hi I want to create two different datasets keeping only duplicated rows within IDs and between IDs respectively. Could anyone throw some light on this?


I have a dataset df with 4 columns ID, Visit, SYSBP and DIABP as follows

A 1 120 80
A 2 130 80
B 1 130 75
B 2 120 80
B 3 130 75
C 1 130 80
C 2 130 80

Now I want to create df_within as

B 1 130 75
B 3 130 75
C 1 130 80
C 2 130 80

and df_between as

A 1 120 80
B 2 120 80
A 2 130 80
C 1 130 80
A 2 130 80
C 2 130 80

How do I do this?

For the first output you can do something like this, unfortunately, I do not understand the logic for the second output, maybe with this reprex someone else could help you with that.

df <- data.frame(stringsAsFactors=FALSE,
                 ID = c("A", "A", "B", "B", "B", "C", "C"),
                 Visit = c(1, 2, 1, 2, 3, 1, 2),
                 SYSBP = c(120, 130, 130, 120, 130, 130, 130),
                 DIABP = c(80, 80, 75, 80, 75, 80, 80))

df %>% 
    add_count(ID, SYSBP, DIABP) %>% 
    filter(n >= 2) %>% 
#> # A tibble: 4 x 4
#>   ID    Visit SYSBP DIABP
#>   <chr> <dbl> <dbl> <dbl>
#> 1 B         1   130    75
#> 2 B         3   130    75
#> 3 C         1   130    80
#> 4 C         2   130    80

One (quite ugly) way of getting df_between is as follows, which (I hope) can be made much simpler:

df <- data.frame(stringsAsFactors = FALSE,
                 ID = c("A", "A", "B", "B", "B", "C", "C"),
                 Visit = c(1, 2, 1, 2, 3, 1, 2),
                 SYSBP = c(120, 130, 130, 120, 130, 130, 130),
                 DIABP = c(80, 80, 75, 80, 75, 80, 80))

df %>%
  group_by(ID, SYSBP, DIABP) %>%
  filter(n() > 1) %>%
#> # A tibble: 4 x 4
#>   ID    Visit SYSBP DIABP
#>   <chr> <dbl> <dbl> <dbl>
#> 1 B         1   130    75
#> 2 B         3   130    75
#> 3 C         1   130    80
#> 4 C         2   130    80

df %>%
  group_by(SYSBP, DIABP) %>%
  filter(n_distinct(ID) > 1) %>%
  group_split() %>%
  lapply(FUN = function(t) {
    apply(X = combn(x = nrow(x = t),
                    m = 2),
          MARGIN = 2,
          FUN = function(z) t[z, ])
  }) %>%
  lapply(FUN = function(t) {
    if (length(x = t) > 1) {
      lapply(X = t,
             FUN = function(z) filter(.data = z,
                                      n_distinct(ID) > 1))
    } else {
  }) %>%
  lapply(FUN = function(t) = rbind,
                                   args = t)) %>%
#> # A tibble: 6 x 4
#>   ID    Visit SYSBP DIABP
#>   <chr> <dbl> <dbl> <dbl>
#> 1 A         1   120    80
#> 2 B         2   120    80
#> 3 A         2   130    80
#> 4 C         1   130    80
#> 5 A         2   130    80
#> 6 C         2   130    80

Many thanks. I was able to do this in a shorter way by using lead and lag function to the IDs. Thanks anyway. Cheers!

