How to make a table containing the counts per hour, even when the count is 0?

Hi all

Its about a dataset containing chemical tests on blood samples and there result time. With the aggregate function, I combine/count the amount of tests that are done in each hour.

I have this kind of output dataframe:
hours | count
0 | 256
1 | 357
2 | 290
3 | 458
...
...
22 | 674
23 | 241

When every hour a test is done, it is listed, but image that the lab closes (not test are done from 7h-14h) ,than there is a gap and the following barplot is wrong by showing less than 24 bars. I have an example of a test that is done 2 time a day, so when I apply the filter of that test on that dataset and try to aggregate again, I have a dataframe with only 2 rows and thus a barplot with only 2 bars (and no 24 bars like I want).

hour | count
17h | 769
21h | 564

How can I code that the hour colomn reaches from 0 untill 23 and if there are no counts for that hour, it adds 0, so the bar plot will have 24 bars (even when the bar represents 0)?

Thanks in advance!

One approach would be to start with a base set of hours from 0 to 23 that you left_join() your data to. Then, fill in NAs with 0.

library(tidyverse)

base_hours = data.frame(hours = 0:23)

mydf = data.frame(hours = c(0, 1, 2, 3, 8),
                  count = c(100, 231, 74, 88, 102))

left_join(base_hours, mydf) %>%
  mutate(count = ifelse(is.na(count), 0, count))
#> Joining, by = "hours"
#>    hours count
#> 1      0   100
#> 2      1   231
#> 3      2    74
#> 4      3    88
#> 5      4     0
#> 6      5     0
#> 7      6     0
#> 8      7     0
#> 9      8   102
#> 10     9     0
#> 11    10     0
#> 12    11     0
#> 13    12     0
#> 14    13     0
#> 15    14     0
#> 16    15     0
#> 17    16     0
#> 18    17     0
#> 19    18     0
#> 20    19     0
#> 21    20     0
#> 22    21     0
#> 23    22     0
#> 24    23     0

Another approach is to summarize the data with the count() function from dplyr and set the .drop argument to FALSE so that any missing hours are listed with a zero.

library(ggplot2)
library(dplyr)

#Make a data frame with no results in hour 19
DF <- data.frame(hour=sample(c(0:18,20:23),100,replace = TRUE))
DF$hour <- factor(DF$hour,levels = 0:23)
DF <- count(DF,hour,.drop = FALSE)
DF
#>    hour n
#> 1     0 3
#> 2     1 6
#> 3     2 3
#> 4     3 7
#> 5     4 7
#> 6     5 5
#> 7     6 5
#> 8     7 4
#> 9     8 6
#> 10    9 4
#> 11   10 5
#> 12   11 3
#> 13   12 4
#> 14   13 5
#> 15   14 2
#> 16   15 3
#> 17   16 1
#> 18   17 6
#> 19   18 4
#> 20   19 0
#> 21   20 4
#> 22   21 7
#> 23   22 3
#> 24   23 3
ggplot(DF,aes(hour,n))+geom_col()

Created on 2022-10-19 with reprex v2.0.2

Yet another approach

library(dplyr)
library(tidyr)

sample_df <- data.frame(
       hours = c(0, 1, 2, 3, 22, 23),
       count = c(256, 357, 290, 458, 674, 241)
)

sample_df %>% 
    complete(hours = full_seq(hours, 1), fill = list("count" = 0))
#> # A tibble: 24 × 2
#>    hours count
#>    <dbl> <dbl>
#>  1     0   256
#>  2     1   357
#>  3     2   290
#>  4     3   458
#>  5     4     0
#>  6     5     0
#>  7     6     0
#>  8     7     0
#>  9     8     0
#> 10     9     0
#> # … with 14 more rows

Created on 2022-10-19 with reprex v2.0.2

Can you paste a small part of your data frame using this code and the script you used? dput(head(yourdf, 20))

Anyway, this code should be fine:

library(tidyverse)

a <- tibble(
    h = c(7:12, 14:21),
    count = sample(100:300,14)
)

a %>% 
  ggplot(aes(h, count)) +
  geom_col() +
  scale_x_continuous(breaks = seq(0,24,1))

This is the result

So we have to create a full day:

all_h <- tibble(h = 0:23)

a_new <- full_join(a, all_h, by = "h") %>% 
  arrange(h) %>% 
  mutate(h = factor(h, levels = 0:23),
         count = ifelse(is.na(count), 0, count))

a_new %>% 
  ggplot(aes(h, count)) +
  geom_col()

But bar aren't displayed, so:

a_new %>% 
  mutate(count = ifelse(count == 0, 1, count)) %>% 
  ggplot(aes(h, count)) +
  geom_col()

Thank you all. I used this code based on the 4the answer!

 all_h <- tibble(hour = 0:23)
    
    df_plot = merge(x=all_h,y=df_aggr_Result,by="hour",all=TRUE)
    df_plot[is.na(df_plot)] <- 0
    barplot(height = df_plot$`amount of samples`,names.arg = df_plot$hour)

Thank you all for your fast response!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.