determining indicator for time overlapping of some interval

I have

 household       person     start time   end time
     1           1          07:45:00    21:45:00
     1           2          09:45:00    17:45:00
     1           3          22:45:00    23:45:00
     1           4          08:45:00    01:45:00
     1           1          23:50:00    24:00:00
     2           1          07:45:00    21:45:00
     2           2          016:45:00   22:45:00

I want to find a column to find overlapping time between family members.

The indicator is: if a person's start and end time has intersection with another member is 1 otherwise 0

In the above example first family, the time of first, second and forth persons have intersection.

output:

 household       person     start time   end time      overlap
      1           1          07:45:00    21:45:00           1
      1           2          09:45:00    17:45:00           1
      1           3          22:45:00    23:45:00           0
      1           4          08:45:00    01:45:00           1
      1           1          23:50:00    24:00:00           0     
      2           1          07:45:00    21:45:00           1
      2           2          016:45:00   22:45:00           1

Hi @sherek. I cannot think of a simple way to do it but there must be a simpler solution. My suggest is to use two apply to compare each conditions and are little bit complicated. And I also concern when comparing times, the situation of midnight will make trouble like line 4 in data table. So, if possible, include the date together with time will be better.

library(tidyverse)

df <- tribble(
  ~household, ~person, ~`start time`, ~`end time`,
  1, 1, "07:45:00", "21:45:00",
  1, 2, "09:45:00", "17:45:00",
  1, 3, "22:45:00", "23:45:00",
  1, 4, "08:45:00", "01:45:00",
  1, 1, "23:50:00", "24:00:00",
  2, 1, "07:45:00", "21:45:00",
  2, 2, "16:45:00", "22:45:00") 

overlap <- apply(df, 1, function(x) {
  res <- apply(df, 1, function(y) {
    if(y[1] != x[1]) {
      return(FALSE)
    }
    if(y[2] == x[2]) {
      return(FALSE)
    }
    if(all(c(strptime(x[3], "%H:%M:%S"), strptime(x[4], "%H:%M:%S")) < strptime(y[3], "%H:%M:%S"))) {
      return(FALSE)
    }
    if(all(c(strptime(x[3], "%H:%M:%S"), strptime(x[4], "%H:%M:%S")) > strptime(y[4], "%H:%M:%S"))) {
      return(FALSE)
    }
    return(TRUE)
  })
  any(res)
})

mutate(df, overlap = as.numeric(overlap))

My solution:

  1. Join the data against itself by household, so that each record is matched against all others in the same household.
  2. Ignore matches of records against themselves.
  3. See if any of the matches overlap.
  4. Collapse those down by unique interval (household by person by start time), flagging any interval that overlaps at least one other interval.

Writing this as data.table code:

library(data.table)

mydata <- data.table(
  household = c(1, 1, 1, 1, 1, 2, 2),
  person = c(1, 2, 3, 4, 1, 1, 2),
  start_time = c(
    "07:45:00", "09:45:00", "22:45:00", "08:45:00",
    "23:50:00", "07:45:00", "16:45:00"
  ),
  end_time = c(
    "21:45:00", "17:45:00", "23:45:00", "01:45:00",
    "24:00:00", "21:45:00", "22:45:00"
  )
)

# Step 1: join dataset to itself
mydata2 <- mydata[
  mydata,
  on = "household",
  allow.cartesian = TRUE
][
  # Step 2: ignore self-self pairs
  person != i.person,
  {
    # Step 3: overlap is when either start is between the other start-end range
    pair_overlap <- between(.BY[["start_time"]], i.start_time, i.end_time) |
      between(i.start_time, .BY[["start_time"]], .BY[["end_time"]])
    # Step 4: collapse to a single value
    list(overlap = any(pair_overlap))
  },
  by = list(household, person, start_time, end_time)
]

mydata2
#    household person start_time end_time overlap
# 1:         1      2   09:45:00 17:45:00    TRUE
# 2:         1      3   22:45:00 23:45:00   FALSE
# 3:         1      4   08:45:00 01:45:00    TRUE
# 4:         1      1   07:45:00 21:45:00    TRUE
# 5:         1      1   23:50:00 24:00:00   FALSE
# 6:         2      2   16:45:00 22:45:00    TRUE
# 7:         2      1   07:45:00 21:45:00    TRUE

Without the comments, the code's fairly compact. If you're used to basic data.table syntax, it's also easy to read.

mydata2 <- mydata[
  mydata,
  on = "household",
  allow.cartesian = TRUE
][
  person != i.person,
  list(overlap = any(
    between(.BY[["start_time"]], i.start_time, i.end_time) |
      between(i.start_time, .BY[["start_time"]], .BY[["end_time"]])
  )),
  by = list(household, person, start_time, end_time)
]
1 Like