Mutate function

Hello

I have a question about the "mutate" function
I have this dataset and I would like to create a new "Total time" column that would be equal to each station's sample time.
So

 df <- df %<%
mutate(Total time = Time_out - Time_in)

Except that I also have stations where there are NA values ​​for Time_in and Time_out.
I would then like to be able to apply the average value of total time of the other stations (where time_in and time_out are known) to the cells of the Total_time column which cannot be calculated due to the NA of the other columns

df <- tribble(
  ~Station, ~time_in, ~time_out, 
  1, 08:17:00, 08:35:00, 
  2, 07:36:03, 08:05:45, 
  3, 11:05:26, 11:35:05, 
  4, 04:58:33, 05:27:15, 
  5, NA, NA, 
  6, NA, NA,  
  7, 14:18:37, 14:47:49
)

Thanks you very mych

I used the hms package to convert the character times to values in seconds.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
library(hms)
df <- tribble(
  ~Station, ~time_in, ~time_out, 
  1, "08:17:00", "08:35:00", 
  2, "07:36:03", "08:05:45", 
  3, "11:05:26", "11:35:05", 
  4, "04:58:33", "05:27:15", 
  5, NA, NA, 
  6, NA, NA,  
  7, "14:18:37", "14:47:49"
)
df |> mutate(across(time_in:time_out, as_hms),
             TotalTime = time_out - time_in,
             TotalTime = ifelse(is.na(TotalTime), 
                                mean(TotalTime, na.rm = TRUE), 
                                TotalTime))
#> # A tibble: 7 x 4
#>   Station time_in  time_out TotalTime
#>     <dbl> <time>   <time>       <dbl>
#> 1       1 08:17:00 08:35:00      1080
#> 2       2 07:36:03 08:05:45      1782
#> 3       3 11:05:26 11:35:05      1779
#> 4       4 04:58:33 05:27:15      1722
#> 5       5       NA       NA      1623
#> 6       6       NA       NA      1623
#> 7       7 14:18:37 14:47:49      1752

Created on 2022-05-19 by the reprex package (v2.0.1)

1 Like

Wouah thanks very much it's exactly what I wanted !
I didn't know that package thank you

I have one more question about my data because there is a problem
For some data the Time_in and the Time_out are for two different days, so I can have for exemple Time_in = 23:42:00 and Time_out = 00:12:00
Those type of data give me negative value for Total_time..

how can I figure it out ?

Thanks

The best solution is to use full timestamps with the day included. If you cannot do that, you can use a test that if time_out is less than time_in, add 86400 (the number of seconds in a day) to the result. This will only work if no time span is greater than 24 hours.

df |> mutate(across(time_in:time_out, as_hms),
             TotalTime = ifelse(time_out > time_in, 
                                time_out - time_in,
                                time_out - time_in + 86400),
             TotalTime = ifelse(is.na(TotalTime), 
                                mean(TotalTime, na.rm = TRUE), 
                                TotalTime))
1 Like

I have the day included in my data but only the day where the sampling started..
The second option work very well ! I just have to do :

df <- df %>% 
      mutate(TotalTime = TotalTime / 60)

to have my TotalTime in minute !
Thanks a lot

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.