Create start and endtime columns based on multiple conditions in R (dplyr, lubridate)

tanishahudson1 · February 5, 2020, 4:09am

I have a dataset, df

 Read	   Box	     ID	     Time
 T	   out               10/1/2019 9:00:01 AM
 T	   out		     10/1/2019 9:00:02 AM
 T	   out		     10/1/2019 9:00:03 AM
 T	   out		     10/1/2019 9:02:59 AM
 T	   out		     10/1/2019 9:03:00 AM
 F			     10/1/2019 9:05:00 AM
 T	   out		     10/1/2019 9:06:00 AM
 T	   out		     10/1/2019 9:06:02 AM
 T	   in		     10/1/2019 9:07:00 AM
 T	   in		     10/1/2019 9:07:02 AM
 T         out		     10/1/2019 9:07:04 AM
 T         out		     10/1/2019 9:07:05 AM
 T	   out		     10/1/2019 9:07:06 AM
         hello	             10/1/2019 9:07:08 AM

Based on certain conditions within this dataset, I would like to create a startime column and an endtime column.
I would like to create a 'starttime' when the following occurs: Read == "T", Box == "out" and ID == ""
When the first instance of this condition occurs, a starttime will be generated. For example for this dataset, the starttime will be 10/1/2019 9:00:01 AM since this is where we see the desired conditions occurs first (Read = T, Box = out and ID = "" )
However, the moment when anyone of these conditions is not true, and endtime will be created. So the first endtime would occur right before row 6, where the time is 10/1/2019 9:03:00 AM. My ultimate goal is to then create a duration column for this.

This is my desired output:

  starttime                    endtime                     duration

  10/01/2019 9:00:01 AM        10/01/2019 9:03:00 AM       179 secs
  10/1/2019 9:06:00 AM         10/1/2019 9:06:02 AM        2 secs
  10/1/2019 9:07:04 AM         10/1/2019 9:07:06 AM        2 secs

dput:

  structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L, 
  3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"), 
  Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L, 
  3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"), 
  ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"), 
  Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM", 
 "10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", 
 "10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", 
 "10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", 
 "10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", 
 "10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA, 
 -14L))

I think overall, I would have to create a loop. I believe I have the thought process correct, just unsure of how to formulate the code. This is what I am trying:

 df2 <- mutate(df,
      Date = lubridate::mdy_hms(Date))




   for ( i in 2:nrow(df2))
    {
  if(df2$Read[[i]] == 'T')
  
     }

I think this may be a start (just placing my conditions within the loop, I am not sure how to complete this)

Any suggestion is appreciated.

technocrat · February 5, 2020, 7:29am

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(stringr)) 
obj <-   structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L, 
  3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"), 
  Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L, 
  3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"), 
  ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"), 
  Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM", 
 "10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", 
 "10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", 
 "10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", 
 "10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", 
 "10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA, 
 -14L))
obj %>%  
mutate(Read = as.logical(Read)) %>% 
mutate(Box = as.character(Box)) %>% 
mutate(ID = as.character(ID)) %>% 
mutate(Time = as.character(Time)) %>% 
mutate(Time = str_replace(Time," AM","")) %>% 
mutate(Time = mdy_hms(Time)) %>% 
mutate(starttime = as_datetime(ifelse(Read == TRUE & Box == "out" & ID == "", Time, NA))) %>% 
mutate(endtime = as_datetime(ifelse(Read != TRUE | Box != "out" | ID != "", Time, NA)))
#>     Read Box    ID                Time           starttime             endtime
#> 1   TRUE out       2019-10-01 09:00:01 2019-10-01 09:00:01                <NA>
#> 2   TRUE out       2019-10-01 09:00:02 2019-10-01 09:00:02                <NA>
#> 3   TRUE out       2019-10-01 09:00:03 2019-10-01 09:00:03                <NA>
#> 4   TRUE out       2019-10-01 09:02:59 2019-10-01 09:02:59                <NA>
#> 5   TRUE out       2019-10-01 09:03:00 2019-10-01 09:03:00                <NA>
#> 6  FALSE           2019-10-01 09:05:00                <NA> 2019-10-01 09:05:00
#> 7   TRUE out       2019-10-01 09:06:00 2019-10-01 09:06:00                <NA>
#> 8   TRUE out       2019-10-01 09:06:02 2019-10-01 09:06:02                <NA>
#> 9   TRUE  in       2019-10-01 09:07:00                <NA> 2019-10-01 09:07:00
#> 10  TRUE  in       2019-10-01 09:07:02                <NA> 2019-10-01 09:07:02
#> 11    NA out       2019-10-01 09:07:04                <NA>                <NA>
#> 12    NA out       2019-10-01 09:07:05                <NA>                <NA>
#> 13  TRUE out       2019-10-01 09:07:06 2019-10-01 09:07:06                <NA>
#> 14    NA     hello 2019-10-01 09:07:08                <NA> 2019-10-01 09:07:08

^{Created on 2020-02-04 by the reprex package (v0.3.0)}

tanishahudson1 · February 5, 2020, 2:55pm

Thank you, I will try

system · February 12, 2020, 2:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.