Counting the Number of People in a Hotel

I am working with the R programming language. Suppose there is a hotel that has a list of customers with their check-in and check-out times (Note: The actual value of the dates is "POSIXct" and is written as "year-month-date".):

check_in_date <- c('2010-01-01', '2010-01-02' ,'2010-01-01', '2010-01-08', '2010-01-08', '2010-01-15', '2010-01-15', '2010-01-16', '2010-01-19', '2010-01-22')
check_out_date <- c('2010-01-07', '2010-01-04' ,'2010-01-09', '2010-01-21', '2010-01-11', '2010-01-22', 'still in hotel as of today', '2010-01-20', '2010-01-25', '2010-01-29')
Person = c("John", "Smith", "Alex", "Peter", "Will", "Matt", "Tim", "Kevin", "Tom", "Adam")

hotel <- data.frame(check_in_date, check_out_date, Person )

The data looks like something like this:

   check_in_date             check_out_date Person
1     2010-01-01                 2010-01-07   John
2     2010-01-02                 2010-01-04  Smith
3     2010-01-01                 2010-01-09   Alex
4     2010-01-08                 2010-01-21  Peter
5     2010-01-08                 2010-01-11   Will
6     2010-01-15                 2010-01-22   Matt
7     2010-01-15 still in hotel as of today    Tim
8     2010-01-16                 2010-01-20  Kevin
9     2010-01-19                 2010-01-25    Tom
10    2010-01-22                 2010-01-29   Adam

Question: I am trying to find out on any given day, how many people were still in the hotel. This would look something like this (just an example, does not correspond to the above data):

  day_of_the_year Number_of_people_currently_in_hotel
1      2010-01-01                                   1
2      2010-01-02                                   1
3      2010-01-03                                   2
4      2010-01-04                                   0
5      2010-01-05                                   5
6      2010-01-06                                   5
7      2010-01-07                                   2
8      2010-01-08                                   2
9      2010-01-09                                   8

I tried to solve this problem in 3 steps:

First Step : I generated a column containing every date from the start to the end (e.g. in this example, let's suppose that there are 31 days : from the start to the end of Jan-2010)

day_of_the_year = seq(as.Date("2010/1/1"), as.Date("2010/1/31"),by="day")

Second Step : I then determined how many people checked in to the hotel at each day:

library(dplyr)

#create some indicator variable 
hotel$event = 1

check_ins = hotel %>% group_by(check_in_date) %>%   summarise(n = n())

 check_in_date     n
  <chr>         <int>
1 2010-01-01        2
2 2010-01-02        1
3 2010-01-08        2
4 2010-01-15        2
5 2010-01-16        1
6 2010-01-19        1
7 2010-01-22        1

Third Step: I then repeated a similar step to determine how many people checked out of the hotel each day:

check_outs = hotel %>% group_by(check_out_date) %>% summarise(n = n())

   check_out_date                 n
   <chr>                      <int>
 1 2010-01-04                     1
 2 2010-01-07                     1
 3 2010-01-09                     1
 4 2010-01-11                     1
 5 2010-01-20                     1
 6 2010-01-21                     1
 7 2010-01-22                     1
 8 2010-01-25                     1
 9 2010-01-29                     1
10 still in hotel as of today     1

Problem: Now, I am not sure how to combine the above 3 Steps in such a way so that we can find out how many people were staying at the hotel each day of the month. Can someone please show me how to do this?

Thanks!

Your pieces could be combined in the following way:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

check_in_date <- c('2010-01-01', '2010-01-02' ,'2010-01-01', '2010-01-08', 
                   '2010-01-08', '2010-01-15', '2010-01-15', '2010-01-16',
                   '2010-01-19', '2010-01-22')
check_out_date <- c('2010-01-07', '2010-01-04' ,'2010-01-09', '2010-01-21',
                    '2010-01-11', '2010-01-22', 'still in hotel as of today', 
                    '2010-01-20', '2010-01-25', '2010-01-29')
Person = c("John", "Smith", "Alex", "Peter", "Will", "Matt", 
           "Tim", "Kevin", "Tom", "Adam")

hotel <- data.frame(check_in_date, check_out_date, Person ) %>%
  mutate (check_in_date =lubridate::ymd(check_in_date),
          check_out_date =lubridate::ymd(check_out_date))
#> Warning: 1 failed to parse.

all_days <- data.frame(date=
     seq(as.Date("2010/1/1"), as.Date("2010/1/31"),by="day")
)

check_ins  <- hotel %>% group_by(check_in_date) %>%   summarise(c_in = n())
check_outs <- hotel %>% group_by(check_out_date) %>% summarise(c_out = n()) %>%
  filter(!is.na(check_out_date))

all_data   <- all_days %>%
  left_join(check_ins, by=c(date='check_in_date')) %>%
  left_join(check_outs, by=c(date='check_out_date')) %>%
  mutate(c_in  = ifelse(is.na(c_in),0,c_in),
         c_out = ifelse(is.na(c_out),0,c_out),
         saldo = c_in - c_out,
         present = cumsum(saldo))

head(all_data)
#>         date c_in c_out saldo present
#> 1 2010-01-01    2     0     2       2
#> 2 2010-01-02    1     0     1       3
#> 3 2010-01-03    0     0     0       3
#> 4 2010-01-04    0     1    -1       2
#> 5 2010-01-05    0     0     0       2
#> 6 2010-01-06    0     0     0       2
Created on 2021-09-16 by the reprex package (v2.0.0)
1 Like

thank you! this worked perfectly! I then used the following code to plot the results:

library(ggplot2)
ggplot(all_data, aes(date, present)) + geom_step()