Calculate how long each row's time period overlaps with other rows

Each row records the start time and end time of a time period. To simply, we assume all are in the same day, so we don't need to bother other more difficult issues.

I need to calculate how long each row's time period overlaps with other rows. And the number of rows that have such overlaps. Suppose I have a data set. I want column D and column E.

There is no sample data. I just use this fake data set to make an example.


dat <- data.frame(id=1:4, 
                                    Start = c("02:50","02:55","03:15","03:25"),
                   End = c("03:10", "03:05", "03:20", "03:30"),
                   num_overlap = c(1,1,0,0),
                   time_overlap = c(10,10,0,0))


few things are not clear to me ,

of overlap is calculated : is this number by hour or minuts ,,

the time of overlap is it by minutes or hours
can you please provide more details on what you want to calculate ? .. and the result you want it to be shown in minutes or hours ... ?
thank you

Thank you for considering my question ! I appreciate.

Time of overlap is by minutes. For example, in my fake data, person A has start time 2:50 (means 02:50) and end time (03:10). A has an overlap with B: 2:55-3:05, and it's 10 minutes.

I input 02:50 in excel and excel returns 2:50. One more example could be: A has start time 15:10 and end time 15:30. B has start time 15:20 and end time 15:40. Thus there is an overlap and time of overlap is 10 minutes. Of course, there could be person C, D,E,,,, and there could be more than one overlap. The # of overlap is the number of overlap. If A has overlap with B and C, then # is 2. The time of overlap is the sum of these overlap time periods.

Dear Vitler

can you please share your excel formula which you are using so i can have a better understanding about it . :wink:

am still unable to figure out how the number of overlap was calculated in this example and the time of overlap .

thank you

Hi Nedallo, there is no formula in my excel fake data set. I just make some examples.

The number of overlap is counted. One time period is from 3 to 4, the other time period is from 3:30 to 4:30. So there is ONE overlap and the time of overlap is 30 minutes. Because both periods have one same time interval: 3:30 - 4.

This will be a pretty computation intensive task. You would basically split all episodes, such that there are no overlaps left. This can be done by duplicating each overlapping row after sorting for start and end time and then adjusting.

In your example you would make a larger dataset, namely with A duplicated twice, adjusting the time of A1 to actual start up to beginning of B, then A2 congruent to B and lastly A3 beginning with end of B up to the actual end.

After that is done, you have to identify congruent intervals and flag them, then calculate durations of those intervals.

In the last step you would collapse the data back to actual start and end, but summing all count flags and durations to one value.

Since you could have infinitely many overlapping intervals, depending on the number of persons you have, you will have to think about a lot of cases which could break the above thoughts. You will also need a lot of flags to find out, which intervals overlap in what way, since there are four possibilities you have to account for.

So it is definitely possible, but really not trivial and potentially problematic regarding computational resources, given the possible infinitely many overlaps.