Calc Start and End Times

Hi,

I have a excel file with a list of employees, with the date/time they started work and date/time when they left work.

Example: Colleague 1, Started: 18/11/2019 07:00, Ended 18/11/2019 09:00. Therefore they worked across the hours of 7,8 & 9.

Is it possible to create a summary table such as:

Date/Time Colleagues at work
18/11/2019 07:00 45
18/11/2019 08:00 60
18/11/2019 09:00 45

Thanks,

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

  1. If you have stored the data set in some R object, dput function is very handy.

  2. In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

Hows this?

TimeSheet <- tibble::tribble(
~Colleague.Number, ~Start.Time.(UTC), ~Actual.End.Time.(UTC),
45, "17/11/2019 08:40", "17/11/2019 09:27",
42, "17/11/2019 08:15", "17/11/2019 08:40",
32, "17/11/2019 07:20", "17/11/2019 08:15",
35, "17/11/2019 05:00", "17/11/2019 07:20",
42, "14/11/2019 05:40", "14/11/2019 07:16",
42, "12/11/2019 12:20", "12/11/2019 12:54",
40, "12/11/2019 10:00", "12/11/2019 12:20",
42, "12/11/2019 07:20", "12/11/2019 10:00"
)

I think this does what you want. Notice I changed the start time of one entry so I could confirm that the code would return values other than zero and one.

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
TimeSheet <- tibble::tribble(
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  42, "12/11/2019 12:00", "12/11/2019 12:54",
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  42, "12/11/2019 07:20", "12/11/2019 10:00"
)
  
TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")

Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")

Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")

DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))

DF2 <- DF %>% group_by(DateTime) %>% 
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))
DF2
#> # A tibble: 17 x 2
#> # Groups:   DateTime [17]
#>    DateTime            CollAtWork
#>    <dttm>                   <int>
#>  1 2019-11-17 05:00:00          1
#>  2 2019-11-17 06:00:00          1
#>  3 2019-11-17 07:00:00          1
#>  4 2019-11-17 08:00:00          1
#>  5 2019-11-17 09:00:00          1
#>  6 2019-11-17 10:00:00          0
#>  7 2019-11-14 05:00:00          0
#>  8 2019-11-14 06:00:00          1
#>  9 2019-11-14 07:00:00          1
#> 10 2019-11-14 08:00:00          0
#> 11 2019-11-12 07:00:00          0
#> 12 2019-11-12 08:00:00          1
#> 13 2019-11-12 09:00:00          1
#> 14 2019-11-12 10:00:00          1
#> 15 2019-11-12 11:00:00          1
#> 16 2019-11-12 12:00:00          2
#> 17 2019-11-12 13:00:00          0

Created on 2019-11-18 by the reprex package (v0.3.0.9000)

2 Likes

Thanks! This is great.

Further to this, how would you add the colleague during the intervals?

For example, if we take the above in DF2, how can we add another column with the colleague that it refers to?

I think this gets to the goal, though there is probably a more elegant method. It assumes there is only one person present at a time and the person whose shift ends exactly on the hour is not counted as present for that last instant, it is the person whose shifts begins on the hour who is listed.

library(dplyr)

TimeSheet <- tibble::tribble(
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  42, "12/11/2019 07:20", "12/11/2019 10:00"
)
  
TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")

Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")

Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")

DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))

DF2 <- DF %>% group_by(DateTime) %>% 
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))

library(lubridate)

HourColleague <- TimeSheet %>% mutate(ID = 1:nrow(TimeSheet), 
                     HourStart = hour(ceiling_date(Start.Time, unit = "hour")),
                     HourEnd = hour(floor_date(Actual.End.Time - 60, unit = "hour"))) %>%
  filter(HourStart <= HourEnd) %>% 
  group_by(ID) %>% 
  mutate(HourSeq = list(tidyr::full_seq(c(HourStart, HourEnd), 1))) %>% 
  tidyr::unnest(cols = HourSeq) %>% 
  mutate(DateTime_bin = make_datetime(year = year(Start.Time),
                                      month = month(Start.Time),
                                      day = day(Start.Time),
                                      hour = HourSeq)) %>%
  ungroup()  
HourColleague <- select(HourColleague, DateTime = DateTime_bin, Colleague.Number)

HourColleague
#> # A tibble: 12 x 2
#>    DateTime            Colleague.Number
#>    <dttm>                         <dbl>
#>  1 2019-11-17 09:00:00               45
#>  2 2019-11-17 08:00:00               32
#>  3 2019-11-17 05:00:00               35
#>  4 2019-11-17 06:00:00               35
#>  5 2019-11-17 07:00:00               35
#>  6 2019-11-14 06:00:00               42
#>  7 2019-11-14 07:00:00               42
#>  8 2019-11-12 10:00:00               40
#>  9 2019-11-12 11:00:00               40
#> 10 2019-11-12 12:00:00               40
#> 11 2019-11-12 08:00:00               42
#> 12 2019-11-12 09:00:00               42

DF3 <- left_join(DF2, HourColleague)
#> Joining, by = "DateTime"
DF3
#> # A tibble: 17 x 3
#> # Groups:   DateTime [17]
#>    DateTime            CollAtWork Colleague.Number
#>    <dttm>                   <int>            <dbl>
#>  1 2019-11-17 05:00:00          1               35
#>  2 2019-11-17 06:00:00          1               35
#>  3 2019-11-17 07:00:00          1               35
#>  4 2019-11-17 08:00:00          1               32
#>  5 2019-11-17 09:00:00          1               45
#>  6 2019-11-17 10:00:00          0               NA
#>  7 2019-11-14 05:00:00          0               NA
#>  8 2019-11-14 06:00:00          1               42
#>  9 2019-11-14 07:00:00          1               42
#> 10 2019-11-14 08:00:00          0               NA
#> 11 2019-11-12 07:00:00          0               NA
#> 12 2019-11-12 08:00:00          1               42
#> 13 2019-11-12 09:00:00          1               42
#> 14 2019-11-12 10:00:00          1               40
#> 15 2019-11-12 11:00:00          1               40
#> 16 2019-11-12 12:00:00          1               40
#> 17 2019-11-12 13:00:00          0               NA

Created on 2019-11-18 by the reprex package (v0.3.0.9000)

2 Likes

Can you see why this would return N/A?

library(dplyr)
library(readxl)

TimeSheet <- read_excel("xxxx")

TimeSheet <- TimeSheet %>% mutate(StartTime = lubridate::ymd_hms(StartTime, tz = "UTC"),
ActualEndTime = lubridate::ymd_hms(ActualEndTime, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"),
as.POSIXct("2019-11-17 23:00:00", tz = "UTC"), "hour")

DF <- data.frame(DateTime = c(Seq1))

DF2 <- DF %>% group_by(DateTime) %>%
mutate(CollAtWork= sum(DateTime >= TimeSheet$StartTime & DateTime < TimeSheet$ActualEndTime))

DF2

Without seeing your data, I can only guess. The first place I would look is the output of ymd_hms(). You are reading in an Excel file and Excel will silently change the format of dates. The imported dates may not match the ymd format. If the problem is not there, please post a Reproducible Example with data showing the problem.

Hi FJCC.

Firstly apologies , relatively new to this forum.

The below code is perfect, it works great. Now I want to look at it by minute. So I want to see the number of colleagues by minute, and the first colleague by that minute.

library(dplyr)

TimeSheet <- tibble::tribble(
  
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  
  42, "12/11/2019 07:20", "12/11/2019 10:00"
  
)

TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))



Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")



Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")



Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")



DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))



DF2 <- DF %>% group_by(DateTime) %>% 
  
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))



library(lubridate)



HourColleague <- TimeSheet %>% mutate(ID = 1:nrow(TimeSheet), 
                                      
                                      HourStart = hour(ceiling_date(Start.Time, unit = "hour")),
                                      
                                      HourEnd = hour(floor_date(Actual.End.Time - 60, unit = "hour"))) %>%
  
  filter(HourStart <= HourEnd) %>% 
  
  group_by(ID) %>% 
  
  mutate(HourSeq = list(tidyr::full_seq(c(HourStart, HourEnd), 1))) %>% 
  
  tidyr::unnest(cols = HourSeq) %>% 
  
  mutate(DateTime_bin = make_datetime(year = year(Start.Time),
                                      
                                      month = month(Start.Time),
                                      
                                      day = day(Start.Time),
                                      
                                      hour = HourSeq)) %>%
  
  ungroup()  

HourColleague <- select(HourColleague, DateTime = DateTime_bin, Colleague.Number)



HourColleague

DF3 <- left_join(DF2, HourColleague)

#> Joining, by = "DateTime"

DF3


Is this possible?

Hi @FJCC - Would you be able to help? I really appreciate the help you have given so far!

Please show the actual code you have written so we can address the specific problem you are having. The first code I posted calculated how many people are present for each time period. The time sequences in that case were for every hour but you can change that to minutes by changing the last parameter of seq.POSIXt() from "h" to "min".

I am not sure what you mean by "the first colleague by that minute" since the data have a resolution of one minute, so a person can only be present or not present in a particular minute. There is no first and second person within a minute.

1 Like

Hi @FJCC, thanks for coming back.

Basically I want to understand which colleague was present during that particular minute. Where a colleague was not present, return a value of "NA" for example.

So where Colleague ID 45 was working from 0840 to 09:27 (row 1), I want every minute between 08:40 and 09:27 to display a value of: The value 1 to show how many colleagues were working, and second the colleague ID of 45 that was present.

library(dplyr)

TimeSheet <- tibble::tribble(
  
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  
  42, "12/11/2019 07:20", "12/11/2019 10:00"
  
)

I think I understand what you want to achieve but I am requesting that you try to solve this problem yourself and, if you get stuck, come back with the code you have written and an explanation of the specific problem you cannot solve.

You're not being very specific but I think you are trying to do something like this

library(dplyr)
library(lubridate)
library(fuzzyjoin)

TimeSheet <- tibble::tribble(
    
    ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
    
    45, "17/11/2019 08:40", "17/11/2019 09:27",
    
    42, "17/11/2019 08:15", "17/11/2019 08:40",
    
    32, "17/11/2019 07:20", "17/11/2019 08:15",
    
    35, "17/11/2019 05:00", "17/11/2019 07:20",
    
    42, "14/11/2019 05:40", "14/11/2019 07:16",
    
    42, "12/11/2019 12:20", "12/11/2019 12:54",
    
    40, "12/11/2019 10:00", "12/11/2019 12:20",
    
    42, "12/11/2019 07:20", "12/11/2019 10:00"
    
)

TimeSheet <- TimeSheet %>% 
    mutate_at(vars(Start.Time, Actual.End.Time), dmy_hm)

colleague_minute <- data.frame(minute = seq.POSIXt(min(TimeSheet$Start.Time),
                                                   max(TimeSheet$Actual.End.Time), "min")) %>%
    fuzzy_left_join(TimeSheet, by = c("minute" = "Start.Time",
                                      "minute" = "Actual.End.Time"),
                    match_fun = c(`>=`, `<=`)) %>% 
    group_by(minute) %>% 
    summarise(n_collegues = n(),
              ids = paste(Colleague.Number, collapse = ",")) %>% 
    filter(ids != "NA")

colleague_minute
#> # A tibble: 700 x 3
#>    minute              n_collegues ids  
#>    <dttm>                    <int> <chr>
#>  1 2019-11-12 07:20:00           1 42   
#>  2 2019-11-12 07:21:00           1 42   
#>  3 2019-11-12 07:22:00           1 42   
#>  4 2019-11-12 07:23:00           1 42   
#>  5 2019-11-12 07:24:00           1 42   
#>  6 2019-11-12 07:25:00           1 42   
#>  7 2019-11-12 07:26:00           1 42   
#>  8 2019-11-12 07:27:00           1 42   
#>  9 2019-11-12 07:28:00           1 42   
#> 10 2019-11-12 07:29:00           1 42   
#> # ā€¦ with 690 more rows
1 Like

Thats great! Thank you.

Just the R script needed.

Hi FJCC - In future with some practice, yes you're right. Im relatively new to R!

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.