Hi everyone
I have a dataset that looks like this:
Patient_ID Visit_DateTime
1 13-08-2016 08:00:00
1 28-05-2013 08:00:00
1 24-12-2013 08:00:00
2 23-05-2012 08:00:00
2 11-01-2014 08:00:00
2 03-10-2017 08:00:00
2 16-02-2015 08:00:00
3 03-08-2013 08:00:00
3 28-10-2017 08:00:00
3 27-10-2013 08:00:00
3 18-04-2015 08:00:00
4 28-04-2016 08:00:00
4 07-08-2012 08:00:00
4 19-03-2013 08:00:00
4 25-05-2017 08:00:00
5 21-11-2014 08:00:00
5 20-03-2016 08:00:00
5 08-01-2015 08:00:00
5 21-09-2012 08:00:00
What i want to do is to extract a list of patients who have been seen in any 2 consecutive years.
Could anyone help please?
Kind regards
Since your data is not in reprex and is difficult to just copy-paste, I'll give you a somewhat general advice with code that is close to what you want:
df %>%
dplyr::group_by(Patient_ID) %>%
dplyr::arrange(Visit_DateTime) %>%
dplyr::mutate(last_visit = dplyr::lag(Visit_DateTime)) %>%
dplyr::mutate(diff = difftime(Visit_DateTime, last_visit, units = "days")) %>%
dplyr::ungroup() %>%
dplyr::filter(diff > 365) %>%
dplyr::pull(Patient_ID)
The main idea is that you can use lag
function from dplyr
to create a column with the last visit and from there it should be relatively straight-forward to apply logic about 2 consecutive years.
The way I did it above is to use 365 days as years, but you can also extract years directly with lubridate::year
and compare them directly. Since it is not obvious what you mean by 2 consecutive years then you can use either approach.
3 Likes
Thank you that is a very nice way of solving it