Median days admitted to hospital based on year

Hi everyone, I am new to R and I thank you ahead for any help!

I have created a MasterTable with a large hospital dataset that includes different demographic data for children who are admitted that includes year and length of hospital stay (LOS) in days. I want to find out the median LOS per year. In the sample dataset, I only made it span 5 years, but the actual dataset spans many more years. How do I figure out the median without creating a separate table where I filter out the dataset per year?

Dataset:

MasterTable <- data.frame(
  Patient.ID = c("A", "B", "C", "C", "C", "D", "D", "E", "F", "G", "G"),
  Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd", "2Aa", "6Rr", "8Ee", "7Pp"),
  Gender = c("Female", "Male", "Male", "Male", "Male", "Female", "Female", "Male", "Female", "Male", "Male"),
  Died = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0),
  Discharge.year = c("2017", "2014", "2015", "2016", "2015", "2017", "2018", "2014", "2018", "2015", "2016"),
  LOS.Days = c(0.59, 4.35, 1.89, 6.02, 2.76, 1.99, 0.96, 11.52, 1.57, 4.16, 3.74))

Created on 2021-04-23 by the reprex package (v2.0.0)

Here is one solution using the dplyr packdage.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
MasterTable <- data.frame(
  Patient.ID = c("A", "B", "C", "C", "C", "D", "D", "E", "F", "G", "G"),
  Admit.ID = c("1Zz", "1Yy", "5Pp", "3Cc", "9Dd", "4Yy", "4Dd", "2Aa", "6Rr", "8Ee", "7Pp"),
  Gender = c("Female", "Male", "Male", "Male", "Male", "Female", "Female", "Male", "Female", "Male", "Male"),
  Died = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0),
  Discharge.year = c("2017", "2014", "2015", "2016", "2015", "2017", "2018", "2014", "2018", "2015", "2016"),
  LOS.Days = c(0.59, 4.35, 1.89, 6.02, 2.76, 1.99, 0.96, 11.52, 1.57, 4.16, 3.74))

Medians <- MasterTable %>% group_by(Discharge.year) %>% 
  summarize(MedianStay = median(LOS.Days))
#> `summarise()` ungrouping output (override with `.groups` argument)
Medians
#> # A tibble: 5 x 2
#>   Discharge.year MedianStay
#>   <chr>               <dbl>
#> 1 2014                 7.94
#> 2 2015                 2.76
#> 3 2016                 4.88
#> 4 2017                 1.29
#> 5 2018                 1.27

Created on 2021-04-23 by the reprex package (v0.3.0)

Thank you that worked perfectly!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.