Filter dataset based on ID and date from another dataset

Hello, I present the folowing situation:

I have 2 dataframes

DT1 consists of unique IDs and dates:

ID     Date
A       01/04/2020
B       05/04/2020

DT2 also contains a list of IDs and dates

ID     Date
A       01/03/2020
A       30/03/2020
A       01/04/2020
A       15/04/2020
B       30/03/2020
B       01/04/2020
B       15/04/2020
C       15/04/2020

DT2 needs to be filtered so only rows with IDs form DT1 remain and the date needs to be bigger than or equal to the date next to the corresponding ID of DT2

ID     Date
A       01/04/2020
A       15/04/2020
B       15/04/2020

Filtering for the IDs goes like this:

DT2 <- filter(DT2, ID %in% DT1$ID)

But how do you add an AND-statement to filter for dates bigger than or equal to the date next to the corresponding ID?

Thanks in advance for your help

Hi @kegoosse,
Here's one way to do it:

a <- "
ID     Date
A       01/04/2020
B       05/04/2020
"

DT1 <- read.table(text=a, header=TRUE, stringsAsFactors=FALSE)
DT1$Date <- as.Date(DT1$Date, format="%d/%m/%Y")
DT1

b <- "
ID     Date
A       01/03/2020
A       30/03/2020
A       01/04/2020
A       15/04/2020
B       30/03/2020
B       01/04/2020
B       15/04/2020
C       15/04/2020
"

DT2 <- read.table(text=b, header=TRUE, stringsAsFactors=FALSE)
DT2$Date <- as.Date(DT2$Date, format="%d/%m/%Y")
DT2

library(dplyr)
inner_join(DT2, DT1, by="ID") %>% 
  filter(Date.x >= Date.y) %>% 
  select(-Date.y) %>% 
  rename(Date = Date.x) -> DT3

DT3

HTH

2 Likes

Thanks. Worked elegantly.

Other option using fuzzyjoin

library(dplyr)
library(lubridate)
library(fuzzyjoin)

DT1 <- data.frame(
  stringsAsFactors = FALSE,
                ID = c("A", "B"),
              Date = c("01/04/2020", "05/04/2020")
)

DT2 <- data.frame(
  stringsAsFactors = FALSE,
                ID = c("A", "A", "A", "A", "B", "B", "B", "C"),
              Date = c("01/03/2020","30/03/2020",
                       "01/04/2020","15/04/2020","30/03/2020","01/04/2020",
                       "15/04/2020","15/04/2020")
)

DT1 %>%
    mutate(Date = dmy(Date)) %>% 
    fuzzy_left_join(DT2 %>%
                        mutate(Date = dmy(Date)),
                    by = c("ID" = "ID",
                           "Date" = "Date"),
                    match_fun = list(`==`, `<=`)) %>% 
    select(ID = ID.y, Date = Date.y)
#>   ID       Date
#> 1  A 2020-04-01
#> 2  A 2020-04-15
#> 3  B 2020-04-15

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.