aggregate by hourly

Hi,

My data looks like, it is 24-hourly data (00:00:00 to 23:00:00)

# A tibble: 6 × 5
  date                   BC6 PM2.5_street PM2.5_hurst_road NCORE_PM2P5
  <dttm>               <dbl>        <dbl>            <dbl>       <dbl>
1 2022-01-01 00:00:00 1.02             NA               13          23
2 2022-01-01 01:00:00 0.120            NA                7           4
3 2022-01-01 02:00:00 0.136            NA                4           2
4 2022-01-01 03:00:00 0.0376           NA                3          NA
5 2022-01-01 04:00:00 0.0372           NA                6           1
6 2022-01-01 05:00:00 0.0326           NA                6           2

# A tibble: 6 × 5
  date                  BC6 PM2.5_street PM2.5_hurst_road NCORE_PM2P5
  <dttm>              <dbl>        <dbl>            <dbl>       <dbl>
1 2022-01-21 18:00:00    NA           22                5          22
2 2022-01-21 19:00:00    NA           26               12          23
3 2022-01-21 20:00:00    NA           20               18          22
4 2022-01-21 21:00:00    NA           26               20          21
5 2022-01-21 22:00:00    NA           17                8          14
6 2022-01-21 23:00:00    NA           18               13          11

I want to generate only hourly mean. (24 points)
I want to create the data like

  date                  BC6 PM2.5_street PM2.5_hurst_road NCORE_PM2P5
  <dttm>              <dbl>        <dbl>            <dbl>       <dbl>
1 00:00:00           
2 01:00:00   
3 02:00:00           
4 03:00:00
5 04:00:00
6 05:00:00
7 06:00:00
.
.
24 23:00:00

I tried

aggregate(hourly_mean['BC6'], list(cut(hourly_mean$date, "24 hour")), mean, na.rm=TRUE) -> new_magee1

But, it's giving daily data.

> head(new_magee1)
     Group.1       BC6
1 2022-01-01 0.3035197
2 2022-01-02 0.7962944
3 2022-01-03 1.0200813
4 2022-01-04 2.1767271
5 2022-01-05 1.9366458
6 2022-01-06 1.5590944

Recommend something like this

library(lubridate)
library(tidyverse)

df %>% 
  mutate(hr = floor_date(date, unit = "hour")) %>%
  group_by(hr) %>%
  summarize_if(is.numeric, mean, na.rm = TRUE) %>%
  ungroup()

After applying your code, the data does not change, looks like the original data! Could not get only 24 hours mean points.

# A tibble: 504 × 5
   hr                     BC6 PM2.5_street PM2.5_hurst_road NCORE_PM2P5
   <dttm>               <dbl>        <dbl>            <dbl>       <dbl>
 1 2022-01-01 00:00:00 1.02            NaN               13          23
 2 2022-01-01 01:00:00 0.120           NaN                7           4
 3 2022-01-01 02:00:00 0.136           NaN                4           2
 4 2022-01-01 03:00:00 0.0376          NaN                3         NaN
 5 2022-01-01 04:00:00 0.0372          NaN                6           1
 6 2022-01-01 05:00:00 0.0326          NaN                6           2
 7 2022-01-01 06:00:00 0.0511          NaN                6           3
 8 2022-01-01 07:00:00 0.06            NaN                6           3
 9 2022-01-01 08:00:00 0.0672          NaN                3           2
10 2022-01-01 09:00:00 0.285           NaN                1           9
# … with 494 more rows

my data can be found here

https://www.dropbox.com/t/aNuZmS7Zo2F6bfN3

Are you looking to average all observations of 3pm hours across all days, for example?

Here's an example of how to do that.


library(tidyverse)
data.frame(date = as.POSIXct("2022-01-01 00:00:00") + (60*60)*0:1000,
           val1 = 0:1000,
           val2 = -1000:0) %>%
  group_by(hour = hour(date)) %>%
  summarize(across(val1:val2, mean))

Result


# A tibble: 24 x 3
    hour  val1  val2
   <int> <dbl> <dbl>
 1     0   492  -508
 2     1   493  -507
 3     2   494  -506
 4     3   495  -505
 5     4   496  -504
 6     5   497  -503
 7     6   498  -502
 8     7   499  -501
 9     8   500  -500
10     9   501  -499
# … with 14 more rows

Thanks for your suggestion
I want to create the final output as

hour	value
0:00:00	
1:00:00	
2:00:00	
3:00:00	
4:00:00	
5:00:00	
6:00:00	
7:00:00	
8:00:00	
9:00:00	
10:00:00	
11:00:00	
12:00:00	
13:00:00	
14:00:00	
15:00:00	
16:00:00	
17:00:00	
18:00:00	
19:00:00	
20:00:00	
21:00:00	
22:00:00	
23:00:00	

Can you refresh to see my last edit? Is the issue you want to represent hour in the format 0:00:00 instead of 0?

1 Like

Thanks! That's what I was looking for!
Thanks for your time.

df %>% 
  mutate(hr = hour(as_datetime(date))) %>%
  group_by(hr) %>%
  mutate(hr = ifelse(hr<10, paste0("0", hr,":00:00"), paste0(hr,":00:00"))) |>
  summarize_if(is.numeric, mean, na.rm = TRUE)
1 Like

Thanks for your time!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.