Finding out if a date is within a range

Hello

Fairly new to the R studio area (especially when it comes to loop commands)
Was wondering if someone can help me with a small problem.

I have a large file (800,000 rows). Data is vertical so each row represents an individual (personcode)

each row has a startdate and an end date.

There is an event which occurs for some individuals. Im trying to figure out the easiest way if the outcome happened between any of the start date/end date.

As an example

Personcode startdate enddate eventdate
1 01.01.2010 05.01.2010 10.10.2010
1 01.10.2010 15.10.2010
2 02.02.2012 10.09.2012
3 05.06.2014 06.06.2014 23.06.2014
3 02.02.2018 06.02.2018
3 04.06.2018 04.06.2018

Person 1 the event happened for that individual (in the second row) so the event would be a Yes. Person 2 never had an event. Person 3 had an event but they did not occur between any of the start and end dates for the individual. (I guess ideally a new column is created with Y or N to see if it matches my interest).

Can anyone direct me to a resource or a previous post which would help me with this problem?

Many thanks and much appreciated in advance

It 'looks like' you have characters that represent dates rather than being actual R dates. so my example accounts for that.

df1 <- data.frame(
  stringsAsFactors = FALSE,
  Personcode = c(1L, 1L, 2L, 3L, 3L, 3L),
  startdate = c("01.01.2010","01.10.2010",
                "02.02.2012","05.06.2014","02.02.2018","04.06.2018"),
  enddate = c("05.01.2010","15.10.2010",
              "10.09.2012","06.06.2014","06.02.2018","04.06.2018"),
  eventdate = c("10.10.2010", NA, NA, "23.06.2014", NA, NA)
)

library(tidyverse)
library(lubridate)
(df2 <- tibble(df1)  %>% mutate(across(-Personcode,dmy)))

(pcal <- df2 %>% distinct(Personcode,
                          startdate,
                          enddate))
(pev <- df2 %>% distinct(Personcode,
                          eventdate) %>% na.omit())

(rejoin <- full_join(pcal,pev))

(df3 <- rejoin %>% rowwise() %>% 
                 mutate(had_event = between(x = eventdate,
                                         left = startdate,
                                        right = enddate)))

(fin <- tidyr::replace_na(df3,
                          list(had_event=FALSE)))
1 Like

Another option

library(tidyverse)
library(lubridate)

sample_df <- data.frame(
    stringsAsFactors = FALSE,
    Personcode = c(1L, 1L, 2L, 3L, 3L, 3L),
    startdate = c("01.01.2010","01.10.2010",
                  "02.02.2012","05.06.2014","02.02.2018","04.06.2018"),
    enddate = c("05.01.2010","15.10.2010",
                "10.09.2012","06.06.2014","06.02.2018","04.06.2018"),
    eventdate = c("10.10.2010", NA, NA, "23.06.2014", NA, NA)
)

sample_df %>%
    group_by(Personcode) %>% 
    mutate(across(ends_with("date"), dmy),
           event = first(eventdate) >= startdate & first(eventdate) <= enddate)
#> # A tibble: 6 × 5
#> # Groups:   Personcode [3]
#>   Personcode startdate  enddate    eventdate  event
#>        <int> <date>     <date>     <date>     <lgl>
#> 1          1 2010-01-01 2010-01-05 2010-10-10 FALSE
#> 2          1 2010-10-01 2010-10-15 NA         TRUE 
#> 3          2 2012-02-02 2012-09-10 NA         NA   
#> 4          3 2014-06-05 2014-06-06 2014-06-23 FALSE
#> 5          3 2018-02-02 2018-02-06 NA         FALSE
#> 6          3 2018-06-04 2018-06-04 NA         FALSE

Created on 2021-10-26 by the reprex package (v2.0.1)

1 Like

Many thanks for the suggestions
Will be trying out the codes this weekend

Kind regards

Hicham

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.