Adding another row's data into a calculation

Hello folks,

From this data, I'm trying to calculate the time that has passed between p3_end_time and the following p2_request_time . For example, in row 3 of the excel sheet from Github link below, I'd like to know how much time lapsed from p3_end_time (11:23:25 AM and in row 1) and the following p2_request_time (11:29:44 AM in row 2). In some cases, p3_end_time will occur before the following p2_request_time and therefore will not need to be calculated.

The data is located here: GitHub - derek-l-thomas/time.data: drivers

Thank you!

See the FAQ: How to do a minimal reproducible example reprex for beginners. If you include cut-and-paste data rather than a download-convert link you will attract answers.

It is not actually clear what you want as I cannot find the times you mention but this might work. I converted your spreadsheet into a .csv file for ease of dealing with times and date

library(lubridate)
library(dplyr)

dat1 <-   read.csv("derek_sample.csv")

 request <-  dmy_hms(with(dat1, paste(p2_request_date, p2_request_time, sep = " " )))
 
 begin <- dmy_hms(with(dat1, paste(p3_begin_date, p3_begin_time, sep = " " ))) 

 end <- dmy_hms(with(dat1, paste(p3_end_date, p3_end_time, sep = " " ))) 
 
 trip <-   dat1$trip
 
 pay <-   dat1$gross_orig_driver_pay
 
 dat2 <-   data.frame(request, begin, end, trip, pay)
 
 dat3 <-   mutate(dat2, dur =  lag(request) - end)
  

Is this what you mean?

library(dplyr)
library(readxl)
library(hms)

link <- "https://github.com/derek-l-thomas/time.data/raw/main/derek_sample.xlsx"

download.file(link, "driver_data.xlsx")

driver_data <- readxl::read_excel("driver_data.xlsx")

driver_data %>% 
    mutate(across(contains("time"), as_hms),
           across(contains("date"), as.Date),
           p2_request = as.POSIXct(paste(p2_request_date, p2_request_time)),
           p3_begin = as.POSIXct(paste(p3_begin_date, p3_begin_time)),
           p3_end = as.POSIXct(paste(p3_end_date, p3_end_time))) %>% 
    select(p2_request, p3_begin, p3_end, trip, gross_orig_driver_pay) %>% 
    mutate(time_passed = difftime(p2_request, lead(p3_end), units = "hour"))
#> # A tibble: 90 x 6
#>    p2_request          p3_begin            p3_end              trip     
#>    <dttm>              <dttm>              <dttm>              <chr>    
#>  1 2021-03-03 12:10:33 2021-03-03 12:15:39 2021-03-03 12:33:17 completed
#>  2 2021-03-03 11:29:21 2021-03-03 11:34:16 2021-03-03 12:06:27 completed
#>  3 2021-03-03 10:44:26 2021-03-03 10:47:41 2021-03-03 11:17:06 completed
#>  4 2021-03-03 10:08:28 2021-03-03 10:11:47 2021-03-03 10:42:37 completed
#>  5 2021-03-03 09:42:02 2021-03-03 09:56:46 2021-03-03 10:02:06 completed
#>  6 2021-03-03 09:32:20 2021-03-03 09:38:45 2021-03-03 09:46:05 completed
#>  7 2021-03-03 08:45:00 2021-03-03 08:43:39 2021-03-03 09:30:22 completed
#>  8 2021-03-03 07:30:00 2021-03-03 07:27:44 2021-03-03 08:24:32 completed
#>  9 2021-03-03 06:22:22 2021-03-03 06:32:16 2021-03-03 07:11:43 completed
#> 10 2021-03-02 15:11:13 2021-03-02 15:16:55 2021-03-02 15:43:38 completed
#> # … with 80 more rows, and 2 more variables: gross_orig_driver_pay <dbl>,
#> #   time_passed <drtn>

Created on 2021-04-26 by the reprex package (v2.0.0)

If this doesn't solve your problem, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

Hi! Thank you. I'm new here and trying to figure out the ropes, but want to be helpful in both my asks and maybe helping others one day. This look right?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(readxl)
library(hms)
library(readxl)

link <- "https://github.com/derek-l-thomas/time.data/raw/main/derek_sample.xlsx"
download.file(link, "driver_data.xlsx")

driver_data <- readxl::read_excel("driver_data.xlsx")

This is super helpful - thank you. I noticed how you were able to combine time and date, which seems to make this much easier.

(Excuse me for still learning how to paste the table in here, and for not clearly explaining the problem.)

Using row 2 of what you posted, the time_passed that you calculated should be 4 minutes and 6 seconds. That would be the time that's passed between the end of a driver's trip (p3_end, in row 2) and her next sequential request (p2_request, in row 1). Does that make sense?

Forgive me. I'm still struggling with pasting a data table using Tribble. I'm sure it's not as complicated as I'm making it. My example wasn't clear, but this is really helpful!

Using dat3 from your code above, and in row 2, do you know what unit 246 represents?

It should be 4 minutes and 6 seconds. That would be the time that's passed between the end of a driver's trip ("end", in row 2, 2021-03-03 12:06:27) and her next sequential request ("request", in row 1, 2021-03-03 12:10:33). Of course this means row 1 "dur" should be NA, since there's no data recorded after this trip.

Does that make sense?

Thank you!

That is exactly the result I get with my code but the units are "hours", for presentation purposes I can change the format so it becomes more evident to you.

library(dplyr)
library(readxl)
library(hms)

link <- "https://github.com/derek-l-thomas/time.data/raw/main/derek_sample.xlsx"

download.file(link, "driver_data.xlsx")

driver_data <- readxl::read_excel("driver_data.xlsx")

driver_data %>% 
    mutate(across(contains("time"), as_hms),
           across(contains("date"), as.Date),
           p2_request = as.POSIXct(paste(p2_request_date, p2_request_time)),
           p3_begin = as.POSIXct(paste(p3_begin_date, p3_begin_time)),
           p3_end = as.POSIXct(paste(p3_end_date, p3_end_time))) %>% 
    select(p2_request, p3_begin, p3_end, trip, gross_orig_driver_pay) %>% 
    mutate(time_passed = format(as.POSIXct(as.numeric(difftime(lag(p2_request), p3_end, units = 'secs')), 
                                           origin = '1970-01-01', tz = 'UTC'), '%M:%S')) %>% 
    select(time_passed, everything())
#> # A tibble: 90 x 6
#>    time_passed p2_request          p3_begin            p3_end              trip 
#>    <chr>       <dttm>              <dttm>              <dttm>              <chr>
#>  1 <NA>        2021-03-03 12:10:33 2021-03-03 12:15:39 2021-03-03 12:33:17 comp…
#>  2 04:06       2021-03-03 11:29:21 2021-03-03 11:34:16 2021-03-03 12:06:27 comp…
#>  3 12:15       2021-03-03 10:44:26 2021-03-03 10:47:41 2021-03-03 11:17:06 comp…
#>  4 01:49       2021-03-03 10:08:28 2021-03-03 10:11:47 2021-03-03 10:42:37 comp…
#>  5 06:22       2021-03-03 09:42:02 2021-03-03 09:56:46 2021-03-03 10:02:06 comp…
#>  6 55:57       2021-03-03 09:32:20 2021-03-03 09:38:45 2021-03-03 09:46:05 comp…
#>  7 01:58       2021-03-03 08:45:00 2021-03-03 08:43:39 2021-03-03 09:30:22 comp…
#>  8 20:28       2021-03-03 07:30:00 2021-03-03 07:27:44 2021-03-03 08:24:32 comp…
#>  9 18:17       2021-03-03 06:22:22 2021-03-03 06:32:16 2021-03-03 07:11:43 comp…
#> 10 38:44       2021-03-02 15:11:13 2021-03-02 15:16:55 2021-03-02 15:43:38 comp…
#> # … with 80 more rows, and 1 more variable: gross_orig_driver_pay <dbl>

Created on 2021-04-27 by the reprex package (v2.0.0)

Ah! That's it - thank you!!

The only issue that remains is that, in many cases, p3_end_time occurred before the subsequent p2_request_time. Eg, the request time in row 89 (2021-02-13 06:36:43) occurred before the p3_end_time in row 90. (2021-02-13 06:45:36) and therefore would be NA.

Do you know where the 50:14 comes from in row 89?

Is the absolute value of the time difference, you can replace the negative values for NA before changing format.

library(dplyr)
library(readxl)
library(hms)

link <- "https://github.com/derek-l-thomas/time.data/raw/main/derek_sample.xlsx"

download.file(link, "driver_data.xlsx")

driver_data <- readxl::read_excel("driver_data.xlsx")

driver_data %>% 
    mutate(across(contains("time"), as_hms),
           across(contains("date"), as.Date),
           p2_request = as.POSIXct(paste(p2_request_date, p2_request_time)),
           p3_begin = as.POSIXct(paste(p3_begin_date, p3_begin_time)),
           p3_end = as.POSIXct(paste(p3_end_date, p3_end_time))) %>% 
    select(p2_request, p3_begin, p3_end, trip, gross_orig_driver_pay) %>% 
    mutate(time_passed = as.numeric(difftime(lag(p2_request), p3_end, units = 'secs')),
           time_passed = if_else(time_passed < 0, NA_real_, time_passed),
           time_passed = format(as.POSIXct(time_passed, 
                                           origin = '1970-01-01', tz = 'UTC'), '%M:%S')) %>% 
    select(time_passed, everything())
#> # A tibble: 90 x 6
#>    time_passed p2_request          p3_begin            p3_end              trip 
#>    <chr>       <dttm>              <dttm>              <dttm>              <chr>
#>  1 <NA>        2021-03-03 12:10:33 2021-03-03 12:15:39 2021-03-03 12:33:17 comp…
#>  2 04:06       2021-03-03 11:29:21 2021-03-03 11:34:16 2021-03-03 12:06:27 comp…
#>  3 12:15       2021-03-03 10:44:26 2021-03-03 10:47:41 2021-03-03 11:17:06 comp…
#>  4 01:49       2021-03-03 10:08:28 2021-03-03 10:11:47 2021-03-03 10:42:37 comp…
#>  5 06:22       2021-03-03 09:42:02 2021-03-03 09:56:46 2021-03-03 10:02:06 comp…
#>  6 <NA>        2021-03-03 09:32:20 2021-03-03 09:38:45 2021-03-03 09:46:05 comp…
#>  7 01:58       2021-03-03 08:45:00 2021-03-03 08:43:39 2021-03-03 09:30:22 comp…
#>  8 20:28       2021-03-03 07:30:00 2021-03-03 07:27:44 2021-03-03 08:24:32 comp…
#>  9 18:17       2021-03-03 06:22:22 2021-03-03 06:32:16 2021-03-03 07:11:43 comp…
#> 10 38:44       2021-03-02 15:11:13 2021-03-02 15:16:55 2021-03-02 15:43:38 comp…
#> # … with 80 more rows, and 1 more variable: gross_orig_driver_pay <dbl>

Created on 2021-04-27 by the reprex package (v2.0.0)

1 Like

Spot on. Cutting and pasting into RStudio runs perfectly.

1 Like

Hi! Thanks again - the above was perfect. However, I did make a mistake. My apologies..

P1 should be moved up one line. That is, for the very last trip (that ended on 03-03 at 12:33:17pm), P1 should be 4:06, and so on.