Station Run Time: Calculating Arrival and Departure times of sampling stations

I work in fisheries oceanography. For most of my career, we have used excel to calculate our station run times. Linearly this goes as follows:

We depart from port : arrive at the first station: sample: depart from first station: repeat

Knowing how much we can time-wise (sampling time and running to the next station) is vital for us to get a general idea of how much we can do over a given area.

I've been trying to recreate the spreadsheet we use in r, hopefully to create a Shiny app one day that will be much easier to use. Here's my issue:

The uploaded image shows all the necessary columns of a larger data frame to calculate Departure and Arrival times.

Below is the code used to calculate both Departure and Arrival times. "Dp_Date" is a datetime variable used for your initial departure date (see in screeenshot). "Station == 0" is the port your departing from. I didn't include all of the Lat/Lon data, as it's not part of the question.

SRTdata <- SRTdata %>% mutate(Depart_DT = lubridate::as_datetime(if_else(Station == 0,as.character(Dp_Date),""))) %>% 
  mutate(Arrival_DT = lag(Depart_DT) + (Steam_T)) %>%
  mutate(Depart_DT = Arrival_DT + Station_Time)

Where I'm stumped is how to calculate future Departure and Arrival times as they are based on previous calculations of each other. I think I could do this via purrr:accumulate or purrr::map_df, but I'm stumped. This shouldn't be this difficult.

Thank you for your help.

It looks like an interesting problem but a screenshot of the data is no very useful. We really need some real data.

A handy way to supply some sample data is the dput() function. In the case of a large dataset something like dput(head(mydata, 100)) should supply the data we need. Just do dput(mydata) where mydata is your data. Copy the output and paste it here.

I think the logic of a solution is shown by the following example. I would leave the original departure at time zero, do the calculation, then add the appropriate date offset at the end. I used hours as the unit of time so I could use small integers and not hurt my brain.

library(dplyr)

library(tidyr)
DF <- data.frame(Station = 0:10,
                 Station_time = c(0,3,2,4,1,5,2,4,3,5,1),#In units of hours
                 Steam_time = c(0,8,7,9,10,6,9,7,4,7,8))
DF
#>    Station Station_time Steam_time
#> 1        0            0          0
#> 2        1            3          8
#> 3        2            2          7
#> 4        3            4          9
#> 5        4            1         10
#> 6        5            5          6
#> 7        6            2          9
#> 8        7            4          7
#> 9        8            3          4
#> 10       9            5          7
#> 11      10            1          8

DFlng <- pivot_longer(DF, Station_time:Steam_time, names_to = "Type")
DFlng <- DFlng |> arrange(Station, desc(Type))
DFlng
#> # A tibble: 22 × 3
#>    Station Type         value
#>      <int> <chr>        <dbl>
#>  1       0 Steam_time       0
#>  2       0 Station_time     0
#>  3       1 Steam_time       8
#>  4       1 Station_time     3
#>  5       2 Steam_time       7
#>  6       2 Station_time     2
#>  7       3 Steam_time       9
#>  8       3 Station_time     4
#>  9       4 Steam_time      10
#> 10       4 Station_time     1
#> # … with 12 more rows
DFlng <- DFlng |> mutate(Cum = cumsum(value),
                         Arrv_Dep = ifelse(Type == "Steam_time", "Arr", "Dep"))

Final <- DFlng |> pivot_wider(names_from = c("Type", "Arrv_Dep"),
                     values_from = c("value", "Cum"))
colnames(Final) <- c("Station", "Steam_time", "Station_time", "Arrive","Depart")

Final
#> # A tibble: 11 × 5
#>    Station Steam_time Station_time Arrive Depart
#>      <int>      <dbl>        <dbl>  <dbl>  <dbl>
#>  1       0          0            0      0      0
#>  2       1          8            3      8     11
#>  3       2          7            2     18     20
#>  4       3          9            4     29     33
#>  5       4         10            1     43     44
#>  6       5          6            5     50     55
#>  7       6          9            2     64     66
#>  8       7          7            4     73     77
#>  9       8          4            3     81     84
#> 10       9          7            5     91     96
#> 11      10          8            1    104    105

Created on 2023-03-01 with reprex v2.0.2

By the way, I wouldn't do the pivot_wider() step at the end. I find the long data frame easier to read.

I would use this as an opportunity to reorganize to get a more compact representation.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
depart <- ymd_hm("2023-03-01 07:30")
set.seed(42)
legs <- sample(1000:2000,14*3,replace = TRUE)*3600
shipslog  <- depart + legs 
shipslog <- shipslog[order(shipslog)]
hour_diff <- diff(shipslog)/3600
attributes(hour_diff) <- NULL
shipslog <- data.frame(log = shipslog)
shipslog$duration <- c(NA,hour_diff)
shipslog$status <- ifelse(shipslog$duration < 20,"On station","Cruising")
shipslog[1,2] <- "Depature"
shipslog[42,2] <- "Return"
shipslog
#>                    log duration     status
#> 1  2023-04-12 22:30:00 Departure      <NA>
#> 2  2023-04-12 22:30:00        0 On station
#> 3  2023-04-13 23:30:00       25   Cruising
#> 4  2023-04-15 00:30:00       25   Cruising
#> 5  2023-04-17 06:30:00       54   Cruising
#> 6  2023-04-18 00:30:00       18 On station
#> 7  2023-04-18 07:30:00        7 On station
#> 8  2023-04-18 12:30:00        5 On station
#> 9  2023-04-18 19:30:00        7 On station
#> 10 2023-04-20 18:30:00       47   Cruising
#> 11 2023-04-21 10:30:00       16 On station
#> 12 2023-04-22 17:30:00       31   Cruising
#> 13 2023-04-23 17:30:00       24   Cruising
#> 14 2023-04-24 07:30:00       14 On station
#> 15 2023-04-24 08:30:00        1 On station
#> 16 2023-04-24 09:30:00        1 On station
#> 17 2023-04-24 13:30:00        4 On station
#> 18 2023-04-25 00:30:00       11 On station
#> 19 2023-04-25 07:30:00        7 On station
#> 20 2023-04-26 18:30:00       35   Cruising
#> 21 2023-04-29 00:30:00       54   Cruising
#> 22 2023-05-01 23:30:00       71   Cruising
#> 23 2023-05-03 11:30:00       36   Cruising
#> 24 2023-05-04 02:30:00       15 On station
#> 25 2023-05-05 07:30:00       29   Cruising
#> 26 2023-05-06 23:30:00       40   Cruising
#> 27 2023-05-06 23:30:00        0 On station
#> 28 2023-05-07 19:30:00       20   Cruising
#> 29 2023-05-07 20:30:00        1 On station
#> 30 2023-05-08 08:30:00       12 On station
#> 31 2023-05-16 21:30:00      205   Cruising
#> 32 2023-05-17 18:30:00       21   Cruising
#> 33 2023-05-18 06:30:00       12 On station
#> 34 2023-05-18 13:30:00        7 On station
#> 35 2023-05-18 16:30:00        3 On station
#> 36 2023-05-19 09:30:00       17 On station
#> 37 2023-05-19 21:30:00       12 On station
#> 38 2023-05-20 16:30:00       19 On station
#> 39 2023-05-20 18:30:00        2 On station
#> 40 2023-05-23 05:30:00       59   Cruising
#> 41 2023-05-23 11:30:00        6 On station
#> 42 2023-05-23 11:30:00   Return On station
1 Like

Thank you for this reorganization and formatting. I will try to do the same with my data and reply back if I get stuck.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.