So! I work a lot with time intervals (we're a call center business) and often I need to calculate load per hour. For example, how many minutes did our agents work in a particular hour? Here's a example of what a typical data frame looks like for me:
library(tidyverse)
library(lubridate)
library(padr)
df <- tribble(
~id, ~start_time, ~end_time,
1L, "2018-07-01 09:10:00", "2018-07-01 09:45:00",
2L, "2018-07-01 09:15:00", "2018-07-01 09:30:00",
3L, "2018-07-01 09:20:00", "2018-07-01 10:20:00",
4L, "2018-07-01 09:25:00", "2018-07-01 10:05:00",
5L, "2018-07-01 10:05:00", "2018-07-01 10:15:00"
) %>%
mutate_if(is.character, ymd_hms) %>%
mutate(mins = end_time - start_time)
df
#> # A tibble: 5 x 4
#> id start_time end_time mins
#> <int> <dttm> <dttm> <time>
#> 1 1 2018-07-01 09:10:00 2018-07-01 09:45:00 35 mins
#> 2 2 2018-07-01 09:15:00 2018-07-01 09:30:00 15 mins
#> 3 3 2018-07-01 09:20:00 2018-07-01 10:20:00 60 mins
#> 4 4 2018-07-01 09:25:00 2018-07-01 10:05:00 40 mins
#> 5 5 2018-07-01 10:05:00 2018-07-01 10:15:00 10 mins
To calculate the "load" per hour, I have traditionally done it the quick way by taking the interval length (mins) between start_time
and end_time
, and summing up the minutes based on the start_time
(using padr
to "thicken" the intervals).
df %>%
padr::thicken("hour", by = "start_time", col = "hour") %>%
group_by(hour) %>%
summarise(mins = sum(mins))
#> # A tibble: 2 x 2
#> hour mins
#> <dttm> <time>
#> 1 2018-07-01 09:00:00 150 mins
#> 2 2018-07-01 10:00:00 " 10 mins"
However, that isn't exactly right. There wasn't 150 mins of load in the 9:00 hour, and 10 mins in the 10:00 hour. It's actually 125 and 35 mins respectively.
# What I want:
#> # A tibble: 2 x 2
#> hour mins
#> <dttm> <time>
#> 1 2018-07-01 09:00:00 125 mins
#> 2 2018-07-01 10:00:00 35 mins
In other words, how do I accurately calculate load for intervals which start and end times fall in different hourly periods?
I have a long & complex script to do this for me, by dividing up the data frame into intervals within a hour, intervals crossing a hour (and I calculate the part before the hour and the part after the hour separately), and full-hour periods (for longer calls, e.g., 9:30-11:45, I divide it into 9:30-10:00, 10:00-11:00, and 11:00-11:45, then calculate each interval separately).
All this seems crazy for what sounds like a common thing to do, but I've been unable to find a package or function that does these calculations out of the box. Hoping someone here knows something!