How to combine hospital admission records?

Business context:

Each patient has admission and discharged date from the hospital. However, the day from admission and discharge data could be stored into multiple rows. (e.g. patient John). Also, each patient has a unique Index Date.

I want to get all the records of patients whose discharged and Index date are same. Then, I want to see all the previous or later records for those patients if that records has continuity.

Continuity means find the first time he got admission and the last he got discharged, considering the days overlap with Index date only. Take example John, the index and discharge date is same on Aug 2011. Then I can see it admission date is June 11. I will see previous record to see if June 11 matches the discharged date, if it does, include that row. I will see next record from the index date to see if the discharged date Aug 2011 matches, the admission date, if it does then include that row.

Below is my sample data

library(tidyverse)
library(lubridate)

sample_date <- tibble::tribble(
  ~patient,   ~admission,  ~discharged,  ~Index_date,
    "John", "01/04/2011", "01/06/2011", "01/08/2011",
    "John", "01/06/2011", "01/08/2011", "01/08/2011",
    "John", "01/08/2011", "01/10/2011", "01/08/2011",
   "kevin", "01/10/2012", "01/12/2012", "01/12/2012",
     "Sam", "01/08/2012", "01/09/2012", "01/01/2013",
     "Sam", "01/10/2012", "01/01/2013", "01/01/2013",
     "Sam", "01/01/2013", "01/03/2013", "01/01/2013",
    "Matt", "01/05/2012", "01/06/2012", "01/06/2012",
    "Matt", "01/08/2012", "01/09/2012", "01/06/2012",
    "Matt", "01/09/2012", "01/11/2012", "01/06/2012",
   "Sunny", "01/01/2012", "01/05/2012", "01/08/2012",
   "Sunny", "01/05/2012", "01/08/2012", "01/08/2012",
   "Sunny", "01/05/2012", "01/08/2012", "01/08/2012",
   "Sunny", "01/08/2012", "01/09/2012", "01/08/2012"
  ) %>% 
  mutate(admission = dmy(admission), discharged = dmy(discharged), Index_date = dmy(Index_date))

Below is my expected result

expected_result <- tibble::tribble(
  ~patient,   ~admission,  ~discharged,  ~Index_date,
    "John", "01/04/2011", "01/06/2011", "01/08/2011",
    "John", "01/06/2011", "01/08/2011", "01/08/2011",
    "John", "01/08/2011", "01/10/2011", "01/08/2011",
   "kevin", "01/10/2012", "01/12/2012", "01/12/2012",
     "Sam", "01/10/2012", "01/01/2013", "01/01/2013",
     "Sam", "01/01/2013", "01/03/2013", "01/01/2013",
    "Matt", "01/05/2012", "01/06/2012", "01/06/2012",
   "Sunny", "01/01/2012", "01/05/2012", "01/08/2012",
   "Sunny", "01/05/2012", "01/08/2012", "01/08/2012",
   "Sunny", "01/05/2012", "01/08/2012", "01/08/2012",
   "Sunny", "01/08/2012", "01/09/2012", "01/08/2012"
  )%>% 
  mutate(admission = dmy(admission), discharged = dmy(discharged), Index_date = dmy(Index_date))

I have 1 million records like this. I'd be grateful to you for any help on solving this problem.

Here's an approach.
First, I mark which visits start a new continuous series of visits, i.e. where the admission date is later than the prior discharge. Then I count those and mark the "exit" visits.
To create the output, I filter for just the "exit" visits, picking just one where there are repeats (like for Sunny's middle visits), and joining those back to the first table, so we get all the other visits associated with the exit one.

sample_mark_visits <- sample_date %>%
  group_by(patient) %>%
  mutate(new_visit = admission > lag(discharged, default = 0),
         visit_num = cumsum(new_visit),
         exit = Index_date == discharged)

output <- sample_mark_visits %>%
  filter(exit) %>%
  distinct(patient, visit_num) %>%
  left_join(sample_mark_visits)

Output:

> output
# A tibble: 11 x 7
# Groups:   patient [5]
   patient visit_num admission  discharged Index_date new_visit exit 
   <chr>       <int> <date>     <date>     <date>     <lgl>     <lgl>
 1 John            1 2011-04-01 2011-06-01 2011-08-01 TRUE      FALSE
 2 John            1 2011-06-01 2011-08-01 2011-08-01 FALSE     TRUE 
 3 John            1 2011-08-01 2011-10-01 2011-08-01 FALSE     FALSE
 4 kevin           1 2012-10-01 2012-12-01 2012-12-01 TRUE      TRUE 
 5 Sam             2 2012-10-01 2013-01-01 2013-01-01 TRUE      TRUE 
 6 Sam             2 2013-01-01 2013-03-01 2013-01-01 FALSE     FALSE
 7 Matt            1 2012-05-01 2012-06-01 2012-06-01 TRUE      TRUE 
 8 Sunny           1 2012-01-01 2012-05-01 2012-08-01 TRUE      FALSE
 9 Sunny           1 2012-05-01 2012-08-01 2012-08-01 FALSE     TRUE 
10 Sunny           1 2012-05-01 2012-08-01 2012-08-01 FALSE     TRUE 
11 Sunny           1 2012-08-01 2012-09-01 2012-08-01 FALSE     FALSE
4 Likes

There's an important data structure design lurking here: Is your unit of observation the patient or is it a dated event?

~Index_dates seem to be associated one to a patient, while a patient may have multiple ~admission and ~discharge dates.

Where you should be aiming, I think, is to make ~admission and ~discharge columns of lists if your question is

What is John's index date and what were his admission and discharge dates? Or "how many patients were admitted on 2012-01-10?"

Vectors of lists aren't anyone's idea of a jolly good time, but doable, but first let's be sure that I have the right understanding of the problem.

@jonspring you're too good. Elegant solution. Thank you for taking time to explain your approach, I think its very important.

Each observation in the data set is patients with variables such as admission and discharge dates. However, each patient are provided a unique Index date.

I am not well versed with data structure design but I'd love to find out more. Thank you for taking your time to help me out.

The classic paper explains the principle of tidy design and contrasts that with other approaches. This forum is strongly oriented to a tool, RStudio that supports the tidyverse, a collection of programs that implements that design.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.