Filter "hms" type data by using a "hh:mm:ss" format

Daichi · May 17, 2022, 6:42pm

Hi. I'd like to filter "hms" type data by using a "hh:mm:ss" format like "filter(ride_length > '00:10:00')", but I couldn't filter them properly. When I filtered them in the "secs" format like "filter(ride_length > 10*60) ", it seemed work. So, I'm assuming that I need to format the "ride_length" data in "hh:mm:ss", and then perform filter function, but I have no ideas to do that.

Thank you very much for any help you can offer!
Below are the codes:

# Pre-processing
> data <- lapply(csv_list, read_csv) %>% 
+  bind_rows() %>%
+  na.omit(data) %>% 
+  filter(ride_length > 0)

# Check the structure of  the "ride_length" column
> data %>% 
+   select(ride_length) %>% 
+   str()
tibble [4,640,811 × 1] (S3: tbl_df/tbl/data.frame)
 $ ride_length: 'hms' num [1:4640811] 00:14:04 00:05:55 00:48:07 00:06:28 ...
  ..- attr(*, "units")= chr "secs"
 - attr(*, "na.action")= 'omit' Named int [1:1082175] 3 23 26 28 34 36 37 42 46 52 ...
  ..- attr(*, "names")= chr [1:1082175] "3" "23" "26" "28" ...

# Check the data of the "ride_length" column
> data %>% 
+   select(ride_length) %>% 
+   head(5)
# A tibble: 5 × 1
  ride_length
  <time>     
1 14'04"     
2 05'55"     
3 48'07"     
4 06'28"     
5 09'09" 

# Set the filter to show the data that has over 10 mins (it didn't work)
> data %>% 
+   select(ride_length) %>% 
+   filter(ride_length > '00:10:00') %>% 
+   head(5)
# A tibble: 5 × 1
  ride_length
  <time>     
1 14'04"     
2 05'55"     
3 48'07"     
4 06'28"     
5 09'09" 

# Set the filter to show the data that has over 10 mins (= 600 secs) in a different way (it seemed work)
> data %>% 
+   select(ride_length) %>% 
+   filter(ride_length > 10*60) %>% 
+   head(5)
# A tibble: 5 × 1
  ride_length
  <time>     
1 14'04"     
2 48'07"     
3 13'14"     
4 36'25"     
5 14'20"

Sanjmeh · May 17, 2022, 6:48pm

Which package are you using that generates hms class of data? A small component of the data may be pasted using dput()for our reference.

Daichi · May 17, 2022, 7:12pm

Thank you for your reply, Sanjmeh. I installed the "tidyverse" package and the "lubridate", "hms" packages.
Below is the list of all the attached packages.

> search()
 [1] ".GlobalEnv"        "package:lubridate" "package:hms"       "package:forcats"   "package:stringr"  
 [6] "package:dplyr"     "package:purrr"     "package:readr"     "package:tidyr"     "package:tibble"   
[11] "package:ggplot2"   "package:tidyverse" "tools:rstudio"     "package:stats"     "package:graphics" 
[16] "package:grDevices" "package:utils"     "package:datasets"  "package:methods"   "Autoloads"        
[21] "org:r-lib"         "package:base"

Daichi · May 17, 2022, 7:15pm

I run the dput(data$ride_length) and the data is too big to paste them all, so I paste some part of them for your reference.

> dput(data$ride_length)
......
6498, 1041, 199, 410, 1949, 412, 306, 192, 656, 433, 715, 583, 
307, 1288, 216, 2053, 311, 160, 485, 558, 740, 352, 733, 36, 
1484, 1924, 1029, 1604, 1009, 520, 510, 436, 507, 785, 416, 563, 
414, 533, 308, 433, 421, 269, 1352, 3005, 528, 4039, 160, 631, 
505, 359, 206, 346, 1323, 153, 907, 919, 690, 369, 612, 614, 
140, 740), class = c("hms", "difftime"), units = "secs")

nirgrahamuk · May 19, 2022, 9:38am

library(tidyverse)
library(lubridate)
library(hms)

rl <- structure(c(6498, 1041, 199, 410, 1949, 412, 306, 192, 656, 433, 715, 583, 
307, 1288, 216, 2053, 311, 160, 485, 558, 740, 352, 733, 36, 
1484, 1924, 1029, 1604, 1009, 520, 510, 436, 507, 785, 416, 563, 
414, 533, 308, 433, 421, 269, 1352, 3005, 528, 4039, 160, 631, 
505, 359, 206, 346, 1323, 153, 907, 919, 690, 369, 612, 614, 
140, 740), class = c("hms", "difftime"), units = "secs")

(rl_df <- enframe(rl))

# 10 min or above

filter(rl_df,
       value >= parse_hms("00:10:00"))

# 10 min or below
filter(rl_df,
       value < parse_hms("00:10:00"))

Daichi · May 19, 2022, 3:51pm

This code returned the results what I need. Thank you, nirgrahamuk!

Daichi · May 19, 2022, 5:11pm

I noticed I can use the "parse_hms" function for the ride_length column too;

> data %>% 
+   select(ride_length) %>% 
+   head(10)
# A tibble: 10 × 1
   ride_length
   <time>     
 1 14'04"     
 2 05'55"     
 3 48'07"     
 4 06'28"     
 5 09'09"     
 6 13'14"     
 7 06'47"     
 8 04'39"     
 9 06'50"     
10 36'25"

> data %>% 
+   filter(ride_length >= parse_hms("00:10:00")) %>% 
+   select(ride_length) %>% 
+   head(10)
# A tibble: 10 × 1
   ride_length
   <time>     
 1 14'04"     
 2 48'07"     
 3 13'14"     
 4 36'25"     
 5 14'20"     
 6 15'28"     
 7 29'14"     
 8 10'01"     
 9 23'54"     
10 10'24"

system · May 26, 2022, 5:11pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.