combining data frames based on column between another column

I'm trying to combine two data tables, one table of events that have been grouped into zones, and another of sensor measurements. My goal is to compute summery statistics based on the grouped events. Longer term, I want to do change point detection to better end the zones. Short term, I need to get the two data frames combined.

I'm deeply embedded into the tidyverse, and I'm happy with a base R solution, but would be ok w/ with datatable solution as well. For context, I've got ~8MM sensor readings, and 15,000 events.

The first table of events and zones looks like this:

events2<-data.frame(
  event_id = seq(1,6),
  note = sample(c("a", "e", 'i', 'o', 'u', 'y')),
  client_time = c('2020-02-01 08:12:00', '2020-02-01 08:16:00', '2020-02-01 08:22:00', '2020-02-01 13:38:00', '2020-02-01 21:02:00', '2020-02-01 21:20:00'),
  zone_id = c(1,1,1,2,3,3),
  zone_start = c(rep('2020-02-01 08:12:00', 3), '2020-02-01 13:38:00', rep('2020-02-01 21:02:00', 2)),
  zone_end = c(rep('2020-02-01 12:30:00', 3), '2020-02-01 15:38:00', rep('2020-02-01 22:34:00', 2))
) %>% mutate(zone_start = ymd_hms(zone_start, tz = "UTC"), zone_end = ymd_hms(zone_end, tz = "UTC"))
head(events2)

The second table of measurements is much, much longer, and looks like this:

measurements <- data.frame(
    client_time = seq(as.POSIXct("2020-02-01 00:00:00", tz="GMT"),  length.out=1440, by='1 min'), #seq(ISOdate(2020,2,1), ISOdate(2020,2,2), "min"),
    value = (abs((120*(rnorm(1440))))+40)
) %>% mutate(client_time = ymd_hms(client_time, tz = "UTC"))
head(measurements)

I'm trying to create a new dataframe that joins the measurements to the zoned events based on if client_time on measurement is between the zone start and end. I'd also like to add a column based on what zone the measurement took place.

The zones do not overlap, and I do not anticipate the zones overlapping in the future.

I've tried grouping events on zone_id, joining the measurements, and filtering, but I can't seem to fill any values. Fuzzyjoin requires using bioconductor, which I anticipate will be hard to install.

Thank you for your patience.

data.table:

fuzzyjoin is on CRAN, so does not involve Bioconductor:
https://cran.r-project.org/web/packages/fuzzyjoin/index.html

1 Like

You can do a rolling join in data.table. That may be what you're looking for.

I found the solution using tidyr, and filling in values. This enabled grouping, and filtering. It's not as fast as I'd like, but it solves the problem.

This post I made goes into detail about how I solved it.

https://rpubs.com/eunoia/combinations

This is a really good idea for when I need to a speed up.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.