join two data based on a date and a period

Hi
I have two data that I would like to join with the "left_join" function. However, in my first table I have a "date" column, in my second I have two columns : "date_debut" and "date_fin", to definine a period or interval. I would like to join these data if the date is included in the interval of the other table.

Thank you very much for your help and your answer !

You can do this with the fuzzyjoin package. Note that the by argument shows which column will be compared and the match_fun argument determines the function to be used in the comparison. The functions are >= and <= and are enclosed in back ticks because <= and >= are not normally treated as function names.

library(fuzzyjoin)
#> Warning: package 'fuzzyjoin' was built under R version 4.1.2
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
DF <- data.frame(Date = ymd(c("2022-01-12", "2022-01-31", "2022-02-10")),
                 Value = 1:3)
DF2 <- data.frame(Date_debut = ymd(c("2022-01-10", "2022-01-28", "2022-02-01")),
                  Date_fin = ymd(c("2022-01-14", "2022-01-31", "2022-02-07")),
                  Value = 11:13)

JOINED <- fuzzy_left_join(DF, DF2, by = c(Date = "Date_debut", Date = "Date_fin"),
                          match_fun = list(`>=`, `<=`))
JOINED
#>         Date Value.x Date_debut   Date_fin Value.y
#> 1 2022-01-12       1 2022-01-10 2022-01-14      11
#> 2 2022-01-31       2 2022-01-28 2022-01-31      12
#> 3 2022-02-10       3       <NA>       <NA>      NA

Created on 2022-02-01 by the reprex package (v2.0.1)

1 Like

The data.table solution:
Fast overlap joins — foverlaps • data.table (rdatatable.gitlab.io)

If you have a large amount of data then this should be much quicker.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.