Calc Start and End Times

MW19 · November 18, 2019, 2:13pm

Hi,

I have a excel file with a list of employees, with the date/time they started work and date/time when they left work.

Example: Colleague 1, Started: 18/11/2019 07:00, Ended 18/11/2019 09:00. Therefore they worked across the hours of 7,8 & 9.

Is it possible to create a summary table such as:

Date/Time Colleagues at work
18/11/2019 07:00 45
18/11/2019 08:00 60
18/11/2019 09:00 45

Thanks,

andresrcs · November 18, 2019, 2:39pm

Can you please share a small part of the data set in a copy-paste friendly format?

In case you don't know how to do it, there are many options, which include:

If you have stored the data set in some R object, dput function is very handy.
In case the data set is in a spreadsheet, check out the datapasta package. Take a look at this link.

MW19 · November 18, 2019, 2:55pm

Hows this?

TimeSheet <- tibble::tribble(
~Colleague.Number, ~Start.Time.(UTC), ~Actual.End.Time.(UTC),
45, "17/11/2019 08:40", "17/11/2019 09:27",
42, "17/11/2019 08:15", "17/11/2019 08:40",
32, "17/11/2019 07:20", "17/11/2019 08:15",
35, "17/11/2019 05:00", "17/11/2019 07:20",
42, "14/11/2019 05:40", "14/11/2019 07:16",
42, "12/11/2019 12:20", "12/11/2019 12:54",
40, "12/11/2019 10:00", "12/11/2019 12:20",
42, "12/11/2019 07:20", "12/11/2019 10:00"
)

FJCC · November 18, 2019, 4:00pm

I think this does what you want. Notice I changed the start time of one entry so I could confirm that the code would return values other than zero and one.

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
TimeSheet <- tibble::tribble(
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  42, "12/11/2019 12:00", "12/11/2019 12:54",
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  42, "12/11/2019 07:20", "12/11/2019 10:00"
)
  
TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")

Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")

Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")

DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))

DF2 <- DF %>% group_by(DateTime) %>% 
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))
DF2
#> # A tibble: 17 x 2
#> # Groups:   DateTime [17]
#>    DateTime            CollAtWork
#>    <dttm>                   <int>
#>  1 2019-11-17 05:00:00          1
#>  2 2019-11-17 06:00:00          1
#>  3 2019-11-17 07:00:00          1
#>  4 2019-11-17 08:00:00          1
#>  5 2019-11-17 09:00:00          1
#>  6 2019-11-17 10:00:00          0
#>  7 2019-11-14 05:00:00          0
#>  8 2019-11-14 06:00:00          1
#>  9 2019-11-14 07:00:00          1
#> 10 2019-11-14 08:00:00          0
#> 11 2019-11-12 07:00:00          0
#> 12 2019-11-12 08:00:00          1
#> 13 2019-11-12 09:00:00          1
#> 14 2019-11-12 10:00:00          1
#> 15 2019-11-12 11:00:00          1
#> 16 2019-11-12 12:00:00          2
#> 17 2019-11-12 13:00:00          0

^{Created on 2019-11-18 by the reprex package (v0.3.0.9000)}

MW19 · November 18, 2019, 4:37pm

Thanks! This is great.

MW19 · November 18, 2019, 5:52pm

Further to this, how would you add the colleague during the intervals?

For example, if we take the above in DF2, how can we add another column with the colleague that it refers to?

FJCC · November 19, 2019, 12:08am

I think this gets to the goal, though there is probably a more elegant method. It assumes there is only one person present at a time and the person whose shift ends exactly on the hour is not counted as present for that last instant, it is the person whose shifts begins on the hour who is listed.

library(dplyr)

TimeSheet <- tibble::tribble(
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  42, "12/11/2019 07:20", "12/11/2019 10:00"
)
  
TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")

Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")

Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")

DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))

DF2 <- DF %>% group_by(DateTime) %>% 
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))

library(lubridate)

HourColleague <- TimeSheet %>% mutate(ID = 1:nrow(TimeSheet), 
                     HourStart = hour(ceiling_date(Start.Time, unit = "hour")),
                     HourEnd = hour(floor_date(Actual.End.Time - 60, unit = "hour"))) %>%
  filter(HourStart <= HourEnd) %>% 
  group_by(ID) %>% 
  mutate(HourSeq = list(tidyr::full_seq(c(HourStart, HourEnd), 1))) %>% 
  tidyr::unnest(cols = HourSeq) %>% 
  mutate(DateTime_bin = make_datetime(year = year(Start.Time),
                                      month = month(Start.Time),
                                      day = day(Start.Time),
                                      hour = HourSeq)) %>%
  ungroup()  
HourColleague <- select(HourColleague, DateTime = DateTime_bin, Colleague.Number)

HourColleague
#> # A tibble: 12 x 2
#>    DateTime            Colleague.Number
#>    <dttm>                         <dbl>
#>  1 2019-11-17 09:00:00               45
#>  2 2019-11-17 08:00:00               32
#>  3 2019-11-17 05:00:00               35
#>  4 2019-11-17 06:00:00               35
#>  5 2019-11-17 07:00:00               35
#>  6 2019-11-14 06:00:00               42
#>  7 2019-11-14 07:00:00               42
#>  8 2019-11-12 10:00:00               40
#>  9 2019-11-12 11:00:00               40
#> 10 2019-11-12 12:00:00               40
#> 11 2019-11-12 08:00:00               42
#> 12 2019-11-12 09:00:00               42

DF3 <- left_join(DF2, HourColleague)
#> Joining, by = "DateTime"
DF3
#> # A tibble: 17 x 3
#> # Groups:   DateTime [17]
#>    DateTime            CollAtWork Colleague.Number
#>    <dttm>                   <int>            <dbl>
#>  1 2019-11-17 05:00:00          1               35
#>  2 2019-11-17 06:00:00          1               35
#>  3 2019-11-17 07:00:00          1               35
#>  4 2019-11-17 08:00:00          1               32
#>  5 2019-11-17 09:00:00          1               45
#>  6 2019-11-17 10:00:00          0               NA
#>  7 2019-11-14 05:00:00          0               NA
#>  8 2019-11-14 06:00:00          1               42
#>  9 2019-11-14 07:00:00          1               42
#> 10 2019-11-14 08:00:00          0               NA
#> 11 2019-11-12 07:00:00          0               NA
#> 12 2019-11-12 08:00:00          1               42
#> 13 2019-11-12 09:00:00          1               42
#> 14 2019-11-12 10:00:00          1               40
#> 15 2019-11-12 11:00:00          1               40
#> 16 2019-11-12 12:00:00          1               40
#> 17 2019-11-12 13:00:00          0               NA

^{Created on 2019-11-18 by the reprex package (v0.3.0.9000)}

MW19 · November 20, 2019, 10:50am

Can you see why this would return N/A?

library(dplyr)
library(readxl)

TimeSheet <- read_excel("xxxx")

TimeSheet <- TimeSheet %>% mutate(StartTime = lubridate::ymd_hms(StartTime, tz = "UTC"),
ActualEndTime = lubridate::ymd_hms(ActualEndTime, tz = "UTC"))

Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"),
as.POSIXct("2019-11-17 23:00:00", tz = "UTC"), "hour")

DF <- data.frame(DateTime = c(Seq1))

DF2 <- DF %>% group_by(DateTime) %>%
mutate(CollAtWork= sum(DateTime >= TimeSheet$StartTime & DateTime < TimeSheet$ActualEndTime))

DF2

FJCC · November 20, 2019, 2:45pm

Without seeing your data, I can only guess. The first place I would look is the output of ymd_hms(). You are reading in an Excel file and Excel will silently change the format of dates. The imported dates may not match the ymd format. If the problem is not there, please post a Reproducible Example with data showing the problem.

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

MW19 · November 22, 2019, 5:02pm

Hi FJCC.

Firstly apologies , relatively new to this forum.

The below code is perfect, it works great. Now I want to look at it by minute. So I want to see the number of colleagues by minute, and the first colleague by that minute.

library(dplyr)

TimeSheet <- tibble::tribble(
  
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  
  42, "12/11/2019 07:20", "12/11/2019 10:00"
  
)

TimeSheet <- TimeSheet %>% mutate(Start.Time = lubridate::dmy_hm(Start.Time, tz = "UTC"),
                                  
                                  Actual.End.Time = lubridate::dmy_hm(Actual.End.Time, tz = "UTC"))



Seq1 <- seq.POSIXt(as.POSIXct("2019-11-17 05:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-17 10:00:00", tz = "UTC"), "h")



Seq2 <- seq.POSIXt(as.POSIXct("2019-11-14 05:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-14 08:00:00", tz = "UTC"), "h")



Seq3 <- seq.POSIXt(as.POSIXct("2019-11-12 07:00:00", tz = "UTC"), 
                   
                   as.POSIXct("2019-11-12 13:00:00", tz = "UTC"), "h")



DF <- data.frame(DateTime = c(Seq1, Seq2, Seq3))



DF2 <- DF %>% group_by(DateTime) %>% 
  
  mutate(CollAtWork = sum(DateTime >= TimeSheet$Start.Time & DateTime < TimeSheet$Actual.End.Time))



library(lubridate)



HourColleague <- TimeSheet %>% mutate(ID = 1:nrow(TimeSheet), 
                                      
                                      HourStart = hour(ceiling_date(Start.Time, unit = "hour")),
                                      
                                      HourEnd = hour(floor_date(Actual.End.Time - 60, unit = "hour"))) %>%
  
  filter(HourStart <= HourEnd) %>% 
  
  group_by(ID) %>% 
  
  mutate(HourSeq = list(tidyr::full_seq(c(HourStart, HourEnd), 1))) %>% 
  
  tidyr::unnest(cols = HourSeq) %>% 
  
  mutate(DateTime_bin = make_datetime(year = year(Start.Time),
                                      
                                      month = month(Start.Time),
                                      
                                      day = day(Start.Time),
                                      
                                      hour = HourSeq)) %>%
  
  ungroup()  

HourColleague <- select(HourColleague, DateTime = DateTime_bin, Colleague.Number)



HourColleague

DF3 <- left_join(DF2, HourColleague)

#> Joining, by = "DateTime"

DF3

Is this possible?

MW19 · November 24, 2019, 6:55pm

Hi @FJCC - Would you be able to help? I really appreciate the help you have given so far!

FJCC · November 25, 2019, 3:19am

Please show the actual code you have written so we can address the specific problem you are having. The first code I posted calculated how many people are present for each time period. The time sequences in that case were for every hour but you can change that to minutes by changing the last parameter of seq.POSIXt() from "h" to "min".

I am not sure what you mean by "the first colleague by that minute" since the data have a resolution of one minute, so a person can only be present or not present in a particular minute. There is no first and second person within a minute.

MW19 · November 25, 2019, 7:43am

Hi @FJCC, thanks for coming back.

Basically I want to understand which colleague was present during that particular minute. Where a colleague was not present, return a value of "NA" for example.

So where Colleague ID 45 was working from 0840 to 09:27 (row 1), I want every minute between 08:40 and 09:27 to display a value of: The value 1 to show how many colleagues were working, and second the colleague ID of 45 that was present.

library(dplyr)

TimeSheet <- tibble::tribble(
  
  ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
  
  45, "17/11/2019 08:40", "17/11/2019 09:27",
  
  42, "17/11/2019 08:15", "17/11/2019 08:40",
  
  32, "17/11/2019 07:20", "17/11/2019 08:15",
  
  35, "17/11/2019 05:00", "17/11/2019 07:20",
  
  42, "14/11/2019 05:40", "14/11/2019 07:16",
  
  42, "12/11/2019 12:20", "12/11/2019 12:54",
  
  40, "12/11/2019 10:00", "12/11/2019 12:20",
  
  42, "12/11/2019 07:20", "12/11/2019 10:00"
  
)

FJCC · November 25, 2019, 4:20pm

I think I understand what you want to achieve but I am requesting that you try to solve this problem yourself and, if you get stuck, come back with the code you have written and an explanation of the specific problem you cannot solve.

andresrcs · November 25, 2019, 4:22pm

You're not being very specific but I think you are trying to do something like this

library(dplyr)
library(lubridate)
library(fuzzyjoin)

TimeSheet <- tibble::tribble(
    
    ~Colleague.Number, ~Start.Time, ~Actual.End.Time,
    
    45, "17/11/2019 08:40", "17/11/2019 09:27",
    
    42, "17/11/2019 08:15", "17/11/2019 08:40",
    
    32, "17/11/2019 07:20", "17/11/2019 08:15",
    
    35, "17/11/2019 05:00", "17/11/2019 07:20",
    
    42, "14/11/2019 05:40", "14/11/2019 07:16",
    
    42, "12/11/2019 12:20", "12/11/2019 12:54",
    
    40, "12/11/2019 10:00", "12/11/2019 12:20",
    
    42, "12/11/2019 07:20", "12/11/2019 10:00"
    
)

TimeSheet <- TimeSheet %>% 
    mutate_at(vars(Start.Time, Actual.End.Time), dmy_hm)

colleague_minute <- data.frame(minute = seq.POSIXt(min(TimeSheet$Start.Time),
                                                   max(TimeSheet$Actual.End.Time), "min")) %>%
    fuzzy_left_join(TimeSheet, by = c("minute" = "Start.Time",
                                      "minute" = "Actual.End.Time"),
                    match_fun = c(`>=`, `<=`)) %>% 
    group_by(minute) %>% 
    summarise(n_collegues = n(),
              ids = paste(Colleague.Number, collapse = ",")) %>% 
    filter(ids != "NA")

colleague_minute
#> # A tibble: 700 x 3
#>    minute              n_collegues ids  
#>    <dttm>                    <int> <chr>
#>  1 2019-11-12 07:20:00           1 42   
#>  2 2019-11-12 07:21:00           1 42   
#>  3 2019-11-12 07:22:00           1 42   
#>  4 2019-11-12 07:23:00           1 42   
#>  5 2019-11-12 07:24:00           1 42   
#>  6 2019-11-12 07:25:00           1 42   
#>  7 2019-11-12 07:26:00           1 42   
#>  8 2019-11-12 07:27:00           1 42   
#>  9 2019-11-12 07:28:00           1 42   
#> 10 2019-11-12 07:29:00           1 42   
#> # … with 690 more rows

MW19 · November 25, 2019, 4:40pm

Thats great! Thank you.

Just the R script needed.

MW19 · November 25, 2019, 4:41pm

Hi FJCC - In future with some practice, yes you're right. Im relatively new to R!

mara · November 25, 2019, 4:49pm

If your question's been answered (even if by you), would you mind choosing a solution? (See FAQ below for how).

Having questions checked as resolved makes it a bit easier to navigate the site visually and see which threads still need help.

Thanks

system · December 2, 2019, 4:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.