Time diff not for different date variables but for duplicated rows

Two of R-Studio masters kindly helped me to find duplicated responses using this:

source.data <- 
data.frame(
  stringsAsFactors = FALSE,
               VRN = c("A8PWA","A8WTW","A9CGB",
                       "A9CGB","AJ18SHZ","AJ18VJU","AJ18VJU","AJ18VJU"),
  NPSResponseDate = c("2021-09-28 14:21:05",
                       "2021-10-14 13:04:01","2021-10-29 15:44:45",
                       "2021-12-15 11:21:59","2022-05-09 13:01:34","2022-04-21 12:50:06",
                       "2022-05-27 20:52:05","2022-07-15 23:34:26")
)
result <- source.data %>% 
  group_by(VRN) %>% 
  arrange(NPSResponseDate)%>% 
  mutate(Sequence = row_number(),
         Duplicated.VRN=n()>1) %>% 
  ungroup() %>% 
  arrange(VRN, NPSResponseDate)

but I personally think, it would be very useful for many R users to be able to count days (or hours) between responses (based on the NPSResponseDate) rather than just flagging respondents with repeated responses.
As a result, I would need something like this:

result <- data.frame(
  stringsAsFactors = FALSE,
               VRN = c("A8PWA","A8WTW","A9CGB",
                       "A9CGB","AJ18SHZ","AJ18VJU","AJ18VJU","AJ18VJU"),
   NPSResponseDate = c("2021-09-28 14:21:05",
                       "2021-10-14 13:04:01","2021-10-29 15:44:45",
                       "2021-12-15 11:21:59","2022-05-09 13:01:34","2022-04-21 12:50:06",
                       "2022-05-27 20:52:05","2022-07-15 23:34:26"),
      Days_between = c(NA,NA,46.8175263079029,
                       46.8175263079029,NA,36.3347094561032,36.3347094561032,
                       36.3347094561032)
)

result

Is it possible to do it in R?

Yes, it is possible to do in R. The example below uses the lag() function from the dplyr package to look at the timestamp from the previous row (within each group) to calculate the duration.

library(tidyverse)

d = data.frame(
  VRN = c("A8PWA","A8WTW","A9CGB",
          "A9CGB","AJ18SHZ","AJ18VJU","AJ18VJU","AJ18VJU"),
  NPSResponseDate = c("2021-09-28 14:21:05",
                      "2021-10-14 13:04:01","2021-10-29 15:44:45",
                      "2021-12-15 11:21:59","2022-05-09 13:01:34","2022-04-21 12:50:06",
                      "2022-05-27 20:52:05","2022-07-15 23:34:26")
  ) %>%
  # convert to POSIXct
  mutate(NPSResponseDate = as.POSIXct(NPSResponseDate)) %>%
  # for each VRN group: first record set to NA; others set to Date - Previous Date using lag()
  group_by(VRN) %>%
  arrange(NPSResponseDate) %>%
  mutate(Days_between = case_when(
    row_number() == 1 ~ NA_real_,
    TRUE ~ as.numeric(NPSResponseDate - lag(NPSResponseDate))
    )) %>%
  ungroup()

d
#> # A tibble: 8 × 3
#>   VRN     NPSResponseDate     Days_between
#>   <chr>   <dttm>                     <dbl>
#> 1 A8PWA   2021-09-28 14:21:05         NA  
#> 2 A8WTW   2021-10-14 13:04:01         NA  
#> 3 A9CGB   2021-10-29 15:44:45         NA  
#> 4 A9CGB   2021-12-15 11:21:59         46.9
#> 5 AJ18VJU 2022-04-21 12:50:06         NA  
#> 6 AJ18SHZ 2022-05-09 13:01:34         NA  
#> 7 AJ18VJU 2022-05-27 20:52:05         36.3
#> 8 AJ18VJU 2022-07-15 23:34:26         49.1

Note: the column was set to "Days_between" because each timestamp was on a different day (24 hrs or more difference) and the subtraction results in a difference of days. As shown below, the units can change depending on the span between timestamps (days, hours, minutes, seconds), so just be careful to watch your units when implementing.

as.POSIXct('2022-09-08 12:00:00') - as.POSIXct('2022-09-07 12:00:00')
#> Time difference of 1 days
as.POSIXct('2022-09-08 12:00:00') - as.POSIXct('2022-09-07 10:00:00')
#> Time difference of 1.083333 days
as.POSIXct('2022-09-08 12:00:00') - as.POSIXct('2022-09-07 22:00:00')
#> Time difference of 14 hours
as.POSIXct('2022-09-08 12:00:00') - as.POSIXct('2022-09-08 11:22:00')
#> Time difference of 38 mins
as.POSIXct('2022-09-08 12:00:00') - as.POSIXct('2022-09-08 11:59:35')
#> Time difference of 25 secs

Created on 2022-09-08 with reprex v2.0.2.9000

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.