Check previous row in datetime, if time is greater than a certain value, place in a group and take its duration (R, dplyr)

I have a dataset, df: (the dataset contains over 4000 rows)

  DATEB

  9/9/2019 7:51:58 PM
  9/9/2019 7:51:59 PM
  9/9/2019 7:51:59 PM
  9/9/2019 7:52:00 PM
  9/9/2019 7:52:01 PM
  9/9/2019 7:52:01 PM
  9/9/2019 7:52:02 PM
  9/9/2019 7:52:03 PM
  9/9/2019 7:54:00 PM
  9/9/2019 7:54:02 PM
  9/10/2019 8:00:00PM

I wish to place in groups (if the times are not within 10 seconds of the previous row) and then take the duration of the newly formed group.

Desired output:

Group   Duration

 a       5 sec
 b       2 sec
 c       0 sec




 dput:


  structure(list(DATEB = structure(c(2L, 3L, 3L, 4L, 5L, 5L, 6L, 
  7L, 8L, 9L, 1L), .Label = c("      9/10/2019 8:00:00 PM", "      9/9/2019 7:51:58 PM", 
  "      9/9/2019 7:51:59 PM", "      9/9/2019 7:52:00 PM", "      9/9/2019 7:52:01 PM", 
  "      9/9/2019 7:52:02 PM", "      9/9/2019 7:52:03 PM", "      9/9/2019 7:54:00 PM", 
  "      9/9/2019 7:54:02 PM"), class = "factor")), class = "data.frame", row.names = c(NA, 
  -11L))

I have tried the code below, which works well, except, the times seems to cut off right before 2 min, not giving me an accurate duration. (some of the datetimes exceed 2 min)

   library(dplyr)
   df %>%
   mutate(DATEB = lubridate::mdy_hms(DATEB), 
   temp = cut(DATEB, breaks = "2 mins")) %>%
   group_by(temp) %>%
   summarise(duration = difftime(max(DATEB), min(DATEB), units = "secs"))

Any suggestion is appreciated.


df  <- structure(list(DATEB = structure(c(2L, 3L, 3L, 4L, 5L, 5L, 6L, 
                                   7L, 8L, 9L, 1L), .Label = c("      9/10/2019 8:00:00 PM", "      9/9/2019 7:51:58 PM", 
                                                               "      9/9/2019 7:51:59 PM", "      9/9/2019 7:52:00 PM", "      9/9/2019 7:52:01 PM", 
                                                               "      9/9/2019 7:52:02 PM", "      9/9/2019 7:52:03 PM", "      9/9/2019 7:54:00 PM", 
                                                               "      9/9/2019 7:54:02 PM"), class = "factor")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                     -11L))

library(dplyr)
library(lubridate)
df2 <- mutate(df,
              DATEB = lubridate::mdy_hms(DATEB))

df2$time_since_last_row <- df2$DATEB - lag(df2$DATEB)
df2$time_since_last_row[[1]] <- 0 # replace the first NA
df2$group_10s <- 0

for ( i in 2:nrow(df2))
{
if(df2$time_since_last_row[[i]]>seconds(10))
  df2$group_10s[[i]] <-  df2$group_10s[[i-1]] +1 
else 
  df2$group_10s[[i]] <-  df2$group_10s[[i-1]]
}

df3 <- group_by(df2,
                group_10s) %>%
  summarise(volume_in_group=n(),
            min_DATEB=min(DATEB),
            max_DATEB=max(DATEB),
            group_duration = max_DATEB - min_DATEB)
2 Likes

Thank you! Is there a way to edit the code to make sure all the units are all seconds? units = "secs" ??

df3 <- group_by(df2,
                group_10s) %>%
  summarise(volume_in_group=n(),
            min_DATEB=min(DATEB),
            max_DATEB=max(DATEB),
            group_duration = difftime(max_DATEB,min_DATEB,units = "secs"))
2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.