Distributing minutes of an interval across several hours

foundinblank · December 27, 2018, 3:56pm

So! I work a lot with time intervals (we're a call center business) and often I need to calculate load per hour. For example, how many minutes did our agents work in a particular hour? Here's a example of what a typical data frame looks like for me:

library(tidyverse)
library(lubridate)
library(padr)

df <- tribble(
  ~id, ~start_time, ~end_time,
  1L, "2018-07-01 09:10:00", "2018-07-01 09:45:00",
  2L, "2018-07-01 09:15:00", "2018-07-01 09:30:00",
  3L, "2018-07-01 09:20:00", "2018-07-01 10:20:00",
  4L, "2018-07-01 09:25:00", "2018-07-01 10:05:00",
  5L, "2018-07-01 10:05:00", "2018-07-01 10:15:00"
) %>%
  mutate_if(is.character, ymd_hms) %>%
  mutate(mins = end_time - start_time)

df

#> # A tibble: 5 x 4
#>      id start_time          end_time            mins   
#>   <int> <dttm>              <dttm>              <time> 
#> 1     1 2018-07-01 09:10:00 2018-07-01 09:45:00 35 mins
#> 2     2 2018-07-01 09:15:00 2018-07-01 09:30:00 15 mins
#> 3     3 2018-07-01 09:20:00 2018-07-01 10:20:00 60 mins
#> 4     4 2018-07-01 09:25:00 2018-07-01 10:05:00 40 mins
#> 5     5 2018-07-01 10:05:00 2018-07-01 10:15:00 10 mins

To calculate the "load" per hour, I have traditionally done it the quick way by taking the interval length (mins) between start_time and end_time, and summing up the minutes based on the start_time (using padr to "thicken" the intervals).

df %>%
  padr::thicken("hour", by = "start_time", col = "hour") %>%
  group_by(hour) %>%
  summarise(mins = sum(mins))

#> # A tibble: 2 x 2
#>   hour                mins      
#>   <dttm>              <time>    
#> 1 2018-07-01 09:00:00 150 mins  
#> 2 2018-07-01 10:00:00 " 10 mins"

However, that isn't exactly right. There wasn't 150 mins of load in the 9:00 hour, and 10 mins in the 10:00 hour. It's actually 125 and 35 mins respectively.

# What I want: 

#> # A tibble: 2 x 2
#>   hour                mins      
#>   <dttm>              <time>    
#> 1 2018-07-01 09:00:00 125 mins  
#> 2 2018-07-01 10:00:00 35 mins

In other words, how do I accurately calculate load for intervals which start and end times fall in different hourly periods?

I have a long & complex script to do this for me, by dividing up the data frame into intervals within a hour, intervals crossing a hour (and I calculate the part before the hour and the part after the hour separately), and full-hour periods (for longer calls, e.g., 9:30-11:45, I divide it into 9:30-10:00, 10:00-11:00, and 11:00-11:45, then calculate each interval separately).

All this seems crazy for what sounds like a common thing to do, but I've been unable to find a package or function that does these calculations out of the box. Hoping someone here knows something!

mishabalyasin · December 27, 2018, 4:38pm

Just as a quick aside - I do have exact this feeling fairly often and most of the time there is such function.

That being said, in this case specifically, I don't know of any such function, so here is one way of getting what you want. In fact, it is fairly similar to your "long and complex" script, except in my case it's not that long, I hope.

Basic idea is exactly as you've stated - separate long calls into periods and then use periods to calculate the entire load in one hour:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df <- tribble(
  ~id, ~start_time, ~end_time,
  1L, "2018-07-01 09:10:00", "2018-07-01 09:45:00",
  2L, "2018-07-01 09:15:00", "2018-07-01 09:30:00",
  3L, "2018-07-01 09:20:00", "2018-07-01 10:20:00",
  4L, "2018-07-01 09:25:00", "2018-07-01 10:05:00",
  5L, "2018-07-01 10:05:00", "2018-07-01 10:15:00"
) %>%
  mutate_if(is.character, ymd_hms) %>%
  mutate(mins = end_time - start_time) 

periodize <- function(start_time, end_time){
  sh <- lubridate::hour(start_time)
  sm <- lubridate::minute(start_time)
  eh <- lubridate::hour(end_time)
  em <- lubridate::minute(end_time)
  
  n_periods <- (eh - sh)
  
  hour <- sh:eh
  sms <- c(sm, rep(0L, n_periods))
  ems <- c(rep(60L, n_periods), em)
  
  tibble::tibble(hour = hour, load = ems - sms)
}

df %>%
  dplyr::mutate(contribution = purrr::map2(start_time, end_time, periodize)) %>%
  dplyr::select(contribution) %>%
  tidyr::unnest() %>%
  dplyr::group_by(hour) %>%
  dplyr::summarise(load = sum(load))
#> # A tibble: 2 x 2
#>    hour  load
#>   <int> <int>
#> 1     9   125
#> 2    10    35

^{Created on 2018-12-27 by the reprex package (v0.2.1)}

This will fail if calls cross midnight, but correcting for that is left as an exercise for the reader

foundinblank · January 9, 2019, 9:29am

Thanks for replying! Sorry it took me a while to write back - your script is way shorter than my "long and complex one" , and I'm going to try it out (probably next week). I think it's VERY clever that you've separated hour and minute values! We do have intervals crossing midnight so I will try to figure that out too.

system · January 30, 2019, 9:29am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.