Joining datasets using foverlaps

I have two datasets that I would like to join based on the timestamp range. The timestamp doesn't always match in both data frames. Therefore, I would like to join based on a range. I have been advised to us foverlaps.

I'm using this code below but it isn't working not sure how to fulfill all the parameters:

require(data.table)
simple example:
x = data.table(df2,start=c(timestamp),end=c(taskId))
y = data.table(df1,start=c(timestamp),end=c(gpuMemUtilPerc))
setkey(y,start,end)
foverlaps(x, y, by.x = c("timestamp","hostname"), type="within", nomatch = 0L)

error message:

Error in foverlaps(x, y, by.x = c("timestamp", "hostname"), type = "within", :
The last two columns in by.x should correspond to the 'start' and 'end' intervals in data.table 'x' and must be integer/numeric type.

DF2

tibble::tribble(
                  ~timestamp,                                ~hostname,      ~eventName, ~eventType,                                            ~jobId,                                ~taskId,
  "2018-11-08T07:41:45.459Z", "04dc4e9647154250beeee51b866b0715000000",   "TotalRender",    "START", "1024-lvl12-7e026be3-5fd0-48ee-b7d1-abd61f747705", "00390eee-c26c-41da-a02d-556bb7fcac67",
  "2018-11-08T07:41:45.459Z", "04dc4e9647154250beeee51b866b0715000000", "Saving Config",    "START", "1024-lvl12-7e026be3-5fd0-48ee-b7d1-abd61f747705", "00390eee-c26c-41da-a02d-556bb7fcac67",
  "2018-11-08T07:41:32.461Z", "04dc4e9647154250beeee51b866b0715000000",        "Render",    "START", "1024-lvl12-7e026be3-5fd0-48ee-b7d1-abd61f747705", "00390eee-c26c-41da-a02d-556bb7fcac67",
  "2018-11-08T07:41:32.461Z", "04dc4e9647154250beeee51b866b0715000000", "Saving Config",     "STOP", "1024-lvl12-7e026be3-5fd0-48ee-b7d1-abd61f747705", "00390eee-c26c-41da-a02d-556bb7fcac67",
  "2018-11-08T07:42:09.344Z", "04dc4e9647154250beeee51b866b0715000000",        "Render",     "STOP", "1024-lvl12-7e026be3-5fd0-48ee-b7d1-abd61f747705", "00390eee-c26c-41da-a02d-556bb7fcac67"
  )

DF1

tibble::tribble(
                  ~timestamp,                                ~hostname,   ~gpuSerial,                                   ~gpuUUID, ~powerDrawWatt, ~gpuTempC, ~gpuUtilPerc, ~gpuMemUtilPerc,
  "2018-11-08T07:41:44.242Z", "04dc4e9647154250beeee51b866b0715000000", 323217056165, "GPU-a1119ee9-9cd1-919f-a479-b902142c717d",          25.94,       32L,           0L,              0L,
  "2018-11-08T07:41:47.259Z", "04dc4e9647154250beeee51b866b0715000000", 323217056165, "GPU-a1119ee9-9cd1-919f-a479-b902142c717d",          25.84,       32L,           0L,              0L,
  "2018-11-08T07:41:31.285Z", "04dc4e9647154250beeee51b866b0715000000", 323217056165, "GPU-a1119ee9-9cd1-919f-a479-b902142c717d",          25.84,       32L,           0L,              0L,
  "2018-11-08T07:41:33.301Z", "04dc4e9647154250beeee51b866b0715000000", 323217056165, "GPU-a1119ee9-9cd1-919f-a479-b902142c717d",          25.94,       32L,           0L,              0L,
  "2018-11-08T07:41:35.322Z", "04dc4e9647154250beeee51b866b0715000000", 323217056165, "GPU-a1119ee9-9cd1-919f-a479-b902142c717d",          25.84,       32L,           0L,              0L
  )

You are trying to use tibbles with data.table(), as the error message says, you need to convert them to data.tables with as.data.table()

Thanks! I have already tried this as.data.table(). I am getting this error message:

Error in foverlaps(as.data.table(df1), as.data.table(df2), type = "within",  : 
  'y' must be keyed (i.e., sorted, and, marked as sorted). Call setkey(y, ...) first, see ?setkey. Also check the examples in ?foverlaps

The error message is telling you that df2 must be keyed.

I think you need to read up on the data.table vignettes first:
https://cran.r-project.org/web/packages/data.table/index.html

I have updated my code and I am now getting a diffrent error message. Any idea where i am going wrong please?

For a question like this, it's usually best practice to include how you updated your code and what your new error is. Ideally a reprex (FAQ: What's a reproducible example (`reprex`) and how do I do one?) that folks here might tackle.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.