How to calculate daily average of hourly data and for different variables at the same time

kartik.kaushal · October 21, 2019, 11:44pm

Hello People,
I have the data in the form:

Date_Time            Address      Consumption (Litres/hr)
19/4/2016 13:00      41-2TV            20
20/4/2016 14:00      41-2TV            40
21/4/2016 15:00      41-2TV            16
19/4/2016 13:00      39-2TV            20
20/4/2016 14:00      39-2TV            40
21/4/2016 15:00      39-2TV            16
19/4/2016 13:00      43-2TV            20
20/4/2016 14:00      43-2TV            40
21/4/2016 15:00      43-2TV            16

I have the data for each address for the complete day and so on for complete month, How can I calculate the daily average & then leading to monthly average for different addresses at the same time. I have over 100's of addresses and over 1 year of data, , given on daily basis.

FJCC · October 22, 2019, 12:32am

Here is an example with a data set I invented. It only has two addresses but the method will work with any number. The key functions for grouping and summarizing the data are in the dplyr package.

library(lubridate)
library(dplyr)
#Make a toy data set
StartDateTime = ymd_hm("2019-01-01 00:00")
EndDateTime <- ymd_hm("2019-12-31 23:00")
DF <- data.frame(DateTime = c(seq.POSIXt(StartDateTime, EndDateTime, by = "hour"),
                              seq.POSIXt(StartDateTime, EndDateTime, by = "hour")),
                 Address = rep(c("A", "B"), each = 17520),
                 Consumption = sample(10:50, size = 17520, replace = TRUE)
                 )
head(DF)
#>              DateTime Address Consumption
#> 1 2019-01-01 00:00:00       A          40
#> 2 2019-01-01 01:00:00       A          47
#> 3 2019-01-01 02:00:00       A          45
#> 4 2019-01-01 03:00:00       A          18
#> 5 2019-01-01 04:00:00       A          41
#> 6 2019-01-01 05:00:00       A          30

#Add columns for the Day and Month
DF <- DF %>% mutate(Day = day(DateTime), Month = month(DateTime))
#> Warning: The `printer` argument is deprecated as of rlang 0.3.0.
#> This warning is displayed once per session.
head(DF)
#>              DateTime Address Consumption Day Month
#> 1 2019-01-01 00:00:00       A          40   1     1
#> 2 2019-01-01 01:00:00       A          47   1     1
#> 3 2019-01-01 02:00:00       A          45   1     1
#> 4 2019-01-01 03:00:00       A          18   1     1
#> 5 2019-01-01 04:00:00       A          41   1     1
#> 6 2019-01-01 05:00:00       A          30   1     1

#Calculate daily avg
DayStats <- DF %>% group_by(Address, Month, Day) %>% summarize(DayAvg = mean(Consumption))
head(DayStats)
#> # A tibble: 6 x 4
#> # Groups:   Address, Month [1]
#>   Address Month   Day DayAvg
#>   <fct>   <dbl> <int>  <dbl>
#> 1 A           1     1   33  
#> 2 A           1     2   27.8
#> 3 A           1     3   30.5
#> 4 A           1     4   29.2
#> 5 A           1     5   29  
#> 6 A           1     6   32.9

#Calculate monthly avg
MonthStats <- DF %>% group_by(Address, Month) %>% summarize(MonthAvg = mean(Consumption))
MonthStats
#> # A tibble: 24 x 3
#> # Groups:   Address [?]
#>    Address Month MonthAvg
#>    <fct>   <dbl>    <dbl>
#>  1 A           1     30.2
#>  2 A           2     30.2
#>  3 A           3     30.1
#>  4 A           4     30.0
#>  5 A           5     30.3
#>  6 A           6     29.9
#>  7 A           7     30.0
#>  8 A           8     29.9
#>  9 A           9     29.8
#> 10 A          10     29.9
#> # ... with 14 more rows

^{Created on 2019-10-21 by the reprex package (v0.3.0.9000)}

EconomiCurtis · October 22, 2019, 10:02pm

A post was merged into an existing topic: How to order Day_number and Month_number in ascending order keeping the 1st column unchanged

system · November 12, 2019, 10:02pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.