How to compare one value with the rest of the values within a Group

Hi All,

I would like to compare one value of a column with the rest of the values within a group.
Given the example data below : I would like to compare any AESTDT overlapped within a group of Patient and AEPT. So Patient 1234 with AEPT as THROMBOCYTOPENIA should not have overlapped AESTDT as you can see in row no. 2, the AESTDT is in between AESTDT and AEENDT of rowNo. 1 for the same patient 1234 with THROMBOCYTOPENIA as AEPT. The last two records of the below sample data also been overlapped. So i would like to find out those type of overlapped record within the Group of Patient and its AEPT.
Thanks a lot for your help.

PATIENT AESTDTC AEENDTC AETERM1 AEPT
1234 1-Sep-19 29-Sep-19 THROMBOCYTOPENIA THROMBOCYTOPENIA
1234 20-Sep-19 1-Oct-19 THROMBOCYTOPENIA THROMBOCYTOPENIA
1234 1-Dec-19 3-Dec-19 THROMBOCYTOPENIA THROMBOCYTOPENIA
1234 14-Jan-14 27-Jan-14 ANEMIA ANAEMIA
1234 1-Jan-18 5-Jan-18 NEUTROPENIA NEUTROPENIA
1234 3-Jan-18 4-Jan-18 NEUTROPENIA NEUTROPENIA
1234 4-Jan-18 31-Jan-18 NEUTROPENIA NEUTROPENIA
1234 3-Jul-19 30-Jul-19 NEUTROPENIA NEUTROPENIA
1234 30-Jul-19 30-Aug-19 NEUTROPENIA NEUTROPENIA
1234 4-Oct-19 30-Oct-19 NEUTROPENIA NEUTROPENIA
3134 17-Jan-17 3-Mar-17 SWOLLEN LEFT WRIST JOINT SWELLING
3134 2-Jan-16 4-Jan-16 LEG PAIN PAIN IN EXTREMITY
3134 8-Jun-19 NA HYPERPHOSPHATEMIA HYPERPHOSPHATAEMIA
3134 3-Aug-19 4-Nov-19 NEUTROPENIA NEUTROPENIA
3134 16-Jan-20 NA LEUKOPENIA LEUKOPENIA
3134 25-Jan-14 31-Jan-14 ANEMIA ANAEMIA
3134 16-Jan-14 NA WORSENING ANEMIA ANAEMIA
3134 5-Dec-19 30-Dec-19 WORSENING THROMBOCYT THROMBOCYTOPENIA
3134 1-Feb-14 28-Feb-14 WORSENING ANEMIA ANAEMIA
3134 4-Sep-12 7-Sep-12 LEUKOPENIA LEUKOPENIA
3134 21-Aug-12 4-Sep-12 LEUKOPENIA LEUKOPENIA

In future, please post your data in a copy & paste-friendly format by following the guide here. It's tedious to recreate data from the format you've shared so I'm only going to recreate a few records.

I'm making a couple of assumptions regarding your question. There is no variable in your dataset called AESTDT so I presume you mean AESTDTC. Secondly, since you've said that the last two records have overlapping intervals, I presume that entries having the same AESTDTC and AEENDTC are also considered as overlapping.

Your puzzle is challenging and the solution I've provided is somewhat ugly but does the job.

library(tidyverse)
library(lubridate, warn.conflicts = FALSE)

df <- tribble(
  ~PATIENT, ~AESTDTC, ~AEENDTC, ~AETERM1, ~AEPT,
  1234, "1-Sep-19", "29-Sep-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "20-Sep-19", "1-Oct-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "1-Dec-19", "3-Dec-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "4-Jan-18", "31-Jan-18", "NEUTROPENIA", "NEUTROPENIA",
  1234, "3-Jul-19", "30-Jul-19", "NEUTROPENIA", "NEUTROPENIA",
  3134, "16-Jan-20", NA, "LEUKOPENIA", "LEUKOPENIA",
  3134, "4-Sep-12", "7-Sep-12", "LEUKOPENIA", "LEUKOPENIA",
  3134, "21-Aug-12", "4-Sep-12", "LEUKOPENIA", "LEUKOPENIA"
)

df <- mutate_at(df, vars(AESTDTC, AEENDTC), dmy)

intervals <- df %>%
  mutate(AE_interval = interval(AESTDTC, AEENDTC)) %>%
  group_by(PATIENT, AEPT) %>%
  summarise(AE_interval = list(AE_interval), .groups = "drop")

records_with_overlap <- df %>%
  left_join(intervals, by = c("PATIENT", "AEPT")) %>%
  mutate(overlap = map2(AESTDTC, AE_interval, `%within%`)) %>%
  rowwise() %>%
  mutate(overlap_count = reduce(overlap, sum)) %>%
  ungroup() %>%
  filter(overlap_count > 1)

semi_join(df, records_with_overlap)
#> Joining, by = c("PATIENT", "AESTDTC", "AEENDTC", "AETERM1", "AEPT")
#> # A tibble: 2 x 5
#>   PATIENT AESTDTC    AEENDTC    AETERM1          AEPT            
#>     <dbl> <date>     <date>     <chr>            <chr>           
#> 1    1234 2019-09-20 2019-10-01 THROMBOCYTOPENIA THROMBOCYTOPENIA
#> 2    3134 2012-09-04 2012-09-07 LEUKOPENIA       LEUKOPENIA

Created on 2020-09-04 by the reprex package (v0.3.0)

As a possible alternative consider this code:

library(tidyverse)
library(lubridate, warn.conflicts = FALSE)

df <- tribble(
  ~PATIENT, ~AESTDTC, ~AEENDTC, ~AETERM1, ~AEPT,
  1234, "1-Sep-19", "29-Sep-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "20-Sep-19", "1-Oct-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "1-Dec-19", "3-Dec-19", "THROMBOCYTOPENIA", "THROMBOCYTOPENIA",
  1234, "4-Jan-18", "31-Jan-18", "NEUTROPENIA", "NEUTROPENIA",
  1234, "3-Jul-19", "30-Jul-19", "NEUTROPENIA", "NEUTROPENIA",
  3134, "16-Jan-20", NA, "LEUKOPENIA", "LEUKOPENIA",
  3134, "4-Sep-12", "7-Sep-12", "LEUKOPENIA", "LEUKOPENIA",
  3134, "21-Aug-12", "4-Sep-12", "LEUKOPENIA", "LEUKOPENIA"
) %>% 
mutate_at(vars(AESTDTC, AEENDTC), dmy)

calendar <- data.frame(date = seq(from = min(df$AESTDTC), 
                                  to = max(df$AEENDTC, na.rm = T),
                                  by = 1))

output <- df %>% 
  crossing(calendar) %>% 
  filter(date > AESTDTC & date <= AEENDTC) %>% 
  count(PATIENT, AEPT, date) %>%
  filter(n > 1)

It is based on the old Data Warehousing technique of dissolving SCD2 type historization via a cross join to calendar object. In SQL context this is done via a cross join, the {tidyverse} equivalent is tidyr::crossing(). In place of a calendar table I am using a simple dataframe of dates covering the range of AESTDTC and AEENDTC (with NAs removed).

Once the patient data is cross joined to calendar and the dates filtered for validity (ie. only those that fall into interval between AESTDTC and AEENDTC are retained) it is only the simple matter of counting duplicates.

Thanks a lot. It really helped :slight_smile:

Thanks a lot for your help :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.