Calculate Daily Census from admit and discharge date range

Hello, I am fairly new to R so there are many probably basic things that I don't know how to do. I have a dataset with hospital account number (HAR), patient id (MRN), admit date, and discharge date. Each hospital encounter gets a unique number. Each patient has a unique identifier. A given patient can readmit and thus have multiple HARs but every HAR has only one MRN.

I want to generate a patient census as a table that with a distinct count of HARs by date. So, if one HAR has admission date March 1 2021 and discharge date March 3 2021, then that encounter should be counted in March 1, March 2, and March 3.

I have no idea how to do this or even where to get started. Any help would be much appreciated. Note for my purposes the times do not matter, just the dates. Ultimately what I want is something like this:

Date     Census
1/1/2021    6
1/2/2021    7
1/3/2021    4

I don't have any code as of yet, but here are sample data:

data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
               HAR = c("HAR1","HAR2","HAR3","HAR4",
                       "HAR5","HAR6","HAR7","HAR8","HAR9","HAR10","HAR11",
                       "HAR12","HAR13","HAR14","HAR15","HAR16","HAR17",
                       "HAR18","HAR19","HAR20"),
               MRN = c("Pat1","Pat2","Pat3","Pat4",
                       "Pat5","Pat6","Pat6","Pat7","Pat8","Pat9","Pat10",
                       "Pat10","Pat11","Pat12","Pat13","Pat14","Pat14",
                       "Pat15","Pat16","Pat17"),
       `Admit DTS` = c("2020-12-05 17:56:00",
                       "2020-11-27 06:47:00","2020-11-24 16:49:00",
                       "2020-10-07 22:44:00","2021-01-08 14:37:00","2020-11-10 13:28:00",
                       "2020-11-25 09:54:00","2020-12-06 11:00:00",
                       "2020-10-06 16:14:00","2020-11-24 17:01:00","2020-11-14 01:26:00",
                       "2020-11-24 20:18:00","2020-11-16 11:26:00",
                       "2020-11-12 16:11:00","2020-10-20 18:24:00","2020-11-02 16:14:00",
                       "2020-11-11 21:00:00","2020-10-15 10:30:00",
                       "2020-12-15 21:26:00","2021-01-05 16:18:00"),
   `Discharge DTS` = c("2020-12-09 14:22:00",
                       "2020-12-01 14:32:00","2020-11-27 14:20:00",
                       "2020-10-12 13:45:00","2021-01-13 13:30:00","2020-11-12 17:35:00",
                       "2020-11-29 10:19:00","2020-12-08 15:30:00",
                       "2020-10-22 15:00:00","2020-11-30 13:00:00","2020-11-16 14:45:00",
                       "2020-12-03 11:25:00","2020-11-27 15:27:00",
                       "2020-11-24 14:47:00","2020-10-26 14:53:00","2020-11-11 20:20:00",
                       "2020-11-18 16:12:00","2020-10-18 11:30:00",
                       "2020-12-18 14:23:00","2021-01-07 13:41:00")
)

I'm unsure of the result because the data don't exercise it properly (all combinations of HAR and MRN have only single instances).

Some comments:

  1. Avoid having to tick-quote variable names; it's easy to make them more descriptive when it comes time to report
  2. I've used dat as the object name in preference to df or data. Because some operations treat a name that is also a built-in function, you sometimes get a "can't subset closure" error, meaning that the namespace conflict is resolved in favor of the function.
suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

dat <- data.frame(
  HAR = c(
    "HAR1", "HAR2", "HAR3", "HAR4",
    "HAR5", "HAR6", "HAR7", "HAR8", "HAR9", "HAR10", "HAR11",
    "HAR12", "HAR13", "HAR14", "HAR15", "HAR16", "HAR17",
    "HAR18", "HAR19", "HAR20"
  ),
  MRN = c(
    "Pat1", "Pat2", "Pat3", "Pat4",
    "Pat5", "Pat6", "Pat6", "Pat7", "Pat8", "Pat9", "Pat10",
    "Pat10", "Pat11", "Pat12", "Pat13", "Pat14", "Pat14",
    "Pat15", "Pat16", "Pat17"
  ),
  Admit_DTS = c(
    "2020-12-05 17:56:00",
    "2020-11-27 06:47:00", "2020-11-24 16:49:00",
    "2020-10-07 22:44:00", "2021-01-08 14:37:00", "2020-11-10 13:28:00",
    "2020-11-25 09:54:00", "2020-12-06 11:00:00",
    "2020-10-06 16:14:00", "2020-11-24 17:01:00", "2020-11-14 01:26:00",
    "2020-11-24 20:18:00", "2020-11-16 11:26:00",
    "2020-11-12 16:11:00", "2020-10-20 18:24:00", "2020-11-02 16:14:00",
    "2020-11-11 21:00:00", "2020-10-15 10:30:00",
    "2020-12-15 21:26:00", "2021-01-05 16:18:00"
  ),
  Discharge_DTS = c(
    "2020-12-09 14:22:00",
    "2020-12-01 14:32:00", "2020-11-27 14:20:00",
    "2020-10-12 13:45:00", "2021-01-13 13:30:00", "2020-11-12 17:35:00",
    "2020-11-29 10:19:00", "2020-12-08 15:30:00",
    "2020-10-22 15:00:00", "2020-11-30 13:00:00", "2020-11-16 14:45:00",
    "2020-12-03 11:25:00", "2020-11-27 15:27:00",
    "2020-11-24 14:47:00", "2020-10-26 14:53:00", "2020-11-11 20:20:00",
    "2020-11-18 16:12:00", "2020-10-18 11:30:00",
    "2020-12-18 14:23:00", "2021-01-07 13:41:00"
  )
)

dat <- dat %>% mutate(Admit_DTS = ymd_hms(Admit_DTS),
               Discharge_DTS = ymd_hms(Discharge_DTS))

dat %>% mutate(elapsed = floor(Discharge_DTS - Admit_DTS)) %>%
  group_by(HAR) %>% summarize(Census = sum(elapsed))
#> # A tibble: 20 x 2
#>    HAR   Census 
#>    <chr> <drtn> 
#>  1 HAR1   3 days
#>  2 HAR10  5 days
#>  3 HAR11  2 days
#>  4 HAR12  8 days
#>  5 HAR13 11 days
#>  6 HAR14 11 days
#>  7 HAR15  5 days
#>  8 HAR16  9 days
#>  9 HAR17  6 days
#> 10 HAR18  3 days
#> 11 HAR19  2 days
#> 12 HAR2   4 days
#> 13 HAR20  1 days
#> 14 HAR3   2 days
#> 15 HAR4   4 days
#> 16 HAR5   4 days
#> 17 HAR6   2 days
#> 18 HAR7   4 days
#> 19 HAR8   2 days
#> 20 HAR9  15 days

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.