how to subset POSIXct based on POSIXct range

I have a problem figuring out how to extract the time of events that happened during another time of an event(in my case letters). I hope someone could help me out!
in short: I would like the rows of bb tibble whose start or end or both(start and end)time is within the range of aa records.
The end goal is to find how many times the letters occurred in aa tibble.
Please, do let me know if more clarifications are needed!
Thank you in advance!

aa <- tibble(
           start = as.POSIXct(c("2019-05-02 07:08:49", "2019-05-02 07:09:21",
                                    "2019-05-02 07:09:41", "2019-05-02 07:10:05",
                                    "2019-05-02 07:24:52", "2019-05-02 07:28:50",
                                    "2019-05-02 07:29:23", "2019-05-02 07:30:16",
                                    "2019-05-02 07:33:13", "2019-05-02 07:33:43",
                                    "2019-05-02 07:35:31", "2019-05-02 07:36:29",
                                    "2019-05-02 07:38:14", "2019-05-02 07:43:26",
                                    "2019-05-02 07:44:59", "2019-05-02 07:53:45",
                                    "2019-05-02 07:54:28")),
           end = as.POSIXct(c("2019-05-02 07:09:29", "2019-05-02 07:10:02",
                                    "2019-05-02 07:10:17", "2019-05-02 07:10:40",
                                    "2019-05-02 07:29:10", "2019-05-02 07:29:32",
                                    "2019-05-02 07:30:35", "2019-05-02 07:30:53",
                                    "2019-05-02 07:33:48", "2019-05-02 07:34:18",
                                    "2019-05-02 07:36:06", "2019-05-02 07:38:34",
                                    "2019-05-02 07:38:49", "2019-05-02 07:45:19",
                                    "2019-05-02 07:45:35", "2019-05-02 07:54:20",
                                    "2019-05-02 07:55:03")))



bb <- tibble(letters = sample(letters[1:4], 12, TRUE), 
             started = as.POSIXct(c("2019-05-02 07:30:23", "2019-05-02 07:30:56",
                                      "2019-05-02 07:31:29", "2019-05-02 07:31:55",
                                      "2019-05-02 07:32:22", "2019-05-02 07:32:48",
                                      "2019-05-02 07:33:14", "2019-05-02 07:44:36",
                                      "2019-05-02 07:45:11", "2019-05-02 07:45:36",
                                      "2019-05-02 07:46:01", "2019-05-02 07:48:14"
                                     )),
             stopped = as.POSIXct(c("2019-05-02 07:30:56", "2019-05-02 07:31:29",
                                    "2019-05-02 07:31:55", "2019-05-02 07:32:22",
                                    "2019-05-02 07:32:48", "2019-05-02 07:33:14",
                                    "2019-05-02 07:33:40", "2019-05-02 07:45:10",
                                    "2019-05-02 07:45:36", "2019-05-02 07:46:01",
                                    "2019-05-02 07:46:25", "2019-05-02 07:48:48"
                                    )))

Is this what you mean?

library(tibble)
library(dplyr)
library(fuzzyjoin)
set.seed(1)

aa <- tibble(
    start = as.POSIXct(c("2019-05-02 07:08:49", "2019-05-02 07:09:21",
                         "2019-05-02 07:09:41", "2019-05-02 07:10:05",
                         "2019-05-02 07:24:52", "2019-05-02 07:28:50",
                         "2019-05-02 07:29:23", "2019-05-02 07:30:16",
                         "2019-05-02 07:33:13", "2019-05-02 07:33:43",
                         "2019-05-02 07:35:31", "2019-05-02 07:36:29",
                         "2019-05-02 07:38:14", "2019-05-02 07:43:26",
                         "2019-05-02 07:44:59", "2019-05-02 07:53:45",
                         "2019-05-02 07:54:28")),
    end = as.POSIXct(c("2019-05-02 07:09:29", "2019-05-02 07:10:02",
                       "2019-05-02 07:10:17", "2019-05-02 07:10:40",
                       "2019-05-02 07:29:10", "2019-05-02 07:29:32",
                       "2019-05-02 07:30:35", "2019-05-02 07:30:53",
                       "2019-05-02 07:33:48", "2019-05-02 07:34:18",
                       "2019-05-02 07:36:06", "2019-05-02 07:38:34",
                       "2019-05-02 07:38:49", "2019-05-02 07:45:19",
                       "2019-05-02 07:45:35", "2019-05-02 07:54:20",
                       "2019-05-02 07:55:03")))



bb <- tibble(letters = sample(letters[1:4], 12, TRUE), 
             started = as.POSIXct(c("2019-05-02 07:30:23", "2019-05-02 07:30:56",
                                    "2019-05-02 07:31:29", "2019-05-02 07:31:55",
                                    "2019-05-02 07:32:22", "2019-05-02 07:32:48",
                                    "2019-05-02 07:33:14", "2019-05-02 07:44:36",
                                    "2019-05-02 07:45:11", "2019-05-02 07:45:36",
                                    "2019-05-02 07:46:01", "2019-05-02 07:48:14"
             )),
             stopped = as.POSIXct(c("2019-05-02 07:30:56", "2019-05-02 07:31:29",
                                    "2019-05-02 07:31:55", "2019-05-02 07:32:22",
                                    "2019-05-02 07:32:48", "2019-05-02 07:33:14",
                                    "2019-05-02 07:33:40", "2019-05-02 07:45:10",
                                    "2019-05-02 07:45:36", "2019-05-02 07:46:01",
                                    "2019-05-02 07:46:25", "2019-05-02 07:48:48"
             )))

started <- bb %>% 
    fuzzy_inner_join(aa,
                     by = c("started" = "start",
                            "started" = "end"), 
                     match_fun = list(`>=`, `<=`))

stopped <- bb %>% 
    fuzzy_inner_join(aa,
                     by = c("stopped" = "start",
                            "stopped" = "end"), 
                     match_fun = list(`>=`, `<=`))

started %>% 
    bind_rows(stopped) %>% 
    distinct(letters, started, stopped, start, end, .keep_all = TRUE)
#> # A tibble: 8 x 5
#>   letters started             stopped             start              
#>   <chr>   <dttm>              <dttm>              <dttm>             
#> 1 a       2019-05-02 07:30:23 2019-05-02 07:30:56 2019-05-02 07:29:23
#> 2 a       2019-05-02 07:30:23 2019-05-02 07:30:56 2019-05-02 07:30:16
#> 3 c       2019-05-02 07:33:14 2019-05-02 07:33:40 2019-05-02 07:33:13
#> 4 c       2019-05-02 07:44:36 2019-05-02 07:45:10 2019-05-02 07:43:26
#> 5 b       2019-05-02 07:45:11 2019-05-02 07:45:36 2019-05-02 07:43:26
#> 6 b       2019-05-02 07:45:11 2019-05-02 07:45:36 2019-05-02 07:44:59
#> 7 a       2019-05-02 07:32:48 2019-05-02 07:33:14 2019-05-02 07:33:13
#> 8 c       2019-05-02 07:44:36 2019-05-02 07:45:10 2019-05-02 07:44:59
#> # ā€¦ with 1 more variable: end <dttm>

Created on 2019-06-07 by the reprex package (v0.3.0)

2 Likes

Hi, andresrcs, thank you very much for your reply. I was playing all day with the solutions provided by you and stackoverflow. After the n-th check I concluded that the foverlaps() captures everything I want and the modification, if necessary, would be easier. And the computation is faster. Thank you! All the best!

Would you mind posting a link to the SO answer that worked for you and mark that as a solution for this topic?

Note: Please take a look to our Cross-Posting policy

Hi andresrcs, I apologize, I didn't want to be impolite. I'll take a note! I read the cross-post policy. Thank you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.