Extracting ID with consecutive dates

Hi everyone

I have a dataset that looks like this:

Patient_ID Visit_DateTime
1 13-08-2016 08:00:00
1 28-05-2013 08:00:00
1 24-12-2013 08:00:00
2 23-05-2012 08:00:00
2 11-01-2014 08:00:00
2 03-10-2017 08:00:00
2 16-02-2015 08:00:00
3 03-08-2013 08:00:00
3 28-10-2017 08:00:00
3 27-10-2013 08:00:00
3 18-04-2015 08:00:00
4 28-04-2016 08:00:00
4 07-08-2012 08:00:00
4 19-03-2013 08:00:00
4 25-05-2017 08:00:00
5 21-11-2014 08:00:00
5 20-03-2016 08:00:00
5 08-01-2015 08:00:00
5 21-09-2012 08:00:00

What i want to do is to extract a list of patients who have been seen in any 2 consecutive years.

Could anyone help please?

Kind regards

Since your data is not in reprex and is difficult to just copy-paste, I'll give you a somewhat general advice with code that is close to what you want:

df %>%
  dplyr::group_by(Patient_ID) %>%
  dplyr::arrange(Visit_DateTime) %>%
  dplyr::mutate(last_visit = dplyr::lag(Visit_DateTime)) %>%
  dplyr::mutate(diff = difftime(Visit_DateTime, last_visit, units = "days")) %>%
  dplyr::ungroup() %>%
  dplyr::filter(diff > 365) %>%
  dplyr::pull(Patient_ID)

The main idea is that you can use lag function from dplyr to create a column with the last visit and from there it should be relatively straight-forward to apply logic about 2 consecutive years.

The way I did it above is to use 365 days as years, but you can also extract years directly with lubridate::year and compare them directly. Since it is not obvious what you mean by 2 consecutive years then you can use either approach.

3 Likes

Thank you that is a very nice way of solving it