How to match dates across different variables?

latsyrhccm · April 28, 2020, 12:27pm

I'm attempting to match dates in 3 variables (date1:date3) to my main date variable.

I've tried using dplyr join methods but to no avail, as when I specify the variables to merge by it will only combine them together instead of matching them across columns.

I've attached my reprex attempt and the ideal output, any help would be appreciated. Thanks.

#Example dataframe 1
patientid <- c("-2147483645", "-2147483645", "-2147483645", "-2147483645", "-2147483645")
date <- c("2018-08-30", "2018-09-04", "2018-09-13", "2018-09-28", "2018-10-11")

#Example dataframe 2 (one date variable for example as dates in others are too far down the dataframe)
patientid <- c("-2147483645", "-2147483645", "-2147483645", "-2147483645", "-2147483645")
date1 <- c("2018-09-04", "2018-09-28")

#Current output
patientid date       date1      
1         2018-08-30 2018-09-04 
1         2018-09-04 NA         
1         2018-09-13 2018-09-28 
1         2018-09-28 NA         

#Ideal output
patientid date       date1      exacerbation
1         2018-08-30 NA         NO
1         2018-09-04 2018-09-04 YES
1         2018-09-13 NA         NO
1         2018-09-28 2018-09-28 YES

ulfelder · April 28, 2020, 6:46pm

Sorry, I don't understand what you're trying to do, and the posted example doesn't clear it up for me. When you say "match" here, what do you mean? Perhaps you can post a more developed example?

siddharthprabhu · April 28, 2020, 6:47pm

So you're looking for a way to keep both sets of join keys in the output? This is currently only supported in nest_join() by way of the keep = TRUE argument. Once, you have both keys, creating the exacerbation variable is straightforward.

library(tidyverse)

#Example dataframe 1
df1 <- tibble(patientid = c("-2147483645", "-2147483645", "-2147483645", "-2147483645", "-2147483645"),
                  date = c("2018-08-30", "2018-09-04", "2018-09-13", "2018-09-28", "2018-10-11"))

#Example dataframe 2
df2 <- tibble(patientid = c("-2147483645", "-2147483645"),
                  date1 = c("2018-09-04", "2018-09-28"))

df1 %>% 
  nest_join(df2, by = c("patientid", "date" = "date1"), keep = TRUE) %>% 
  unnest(cols = df2, keep_empty = TRUE, names_repair = "universal") %>% 
  select(patientid = patientid...1, date, date1) %>% 
  mutate(exacerbation = if_else(is.na(date1), "NO", "YES"))
#> New names:
#> * patientid -> patientid...1
#> * patientid -> patientid...3
#> # A tibble: 5 x 4
#>   patientid   date       date1      exacerbation
#>   <chr>       <chr>      <chr>      <chr>       
#> 1 -2147483645 2018-08-30 <NA>       NO          
#> 2 -2147483645 2018-09-04 2018-09-04 YES         
#> 3 -2147483645 2018-09-13 <NA>       NO          
#> 4 -2147483645 2018-09-28 2018-09-28 YES         
#> 5 -2147483645 2018-10-11 <NA>       NO

^{Created on 2020-04-29 by the reprex package (v0.3.0)}

In dplyr 1.0.0, the keep parameter will be added to the outer joins after which this operation can be performed without have to resort to tidyr::unnest().

system · May 19, 2020, 6:47pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.