How to find the average consumption by hours

Hi,
I am new at coding so something as simple as doing a condition means seems hard.

In fact, what I am trying to do is to have the average consumption by hours (& for each year, but this is ok).
My data are like this :
image

I have a hard time finding the i e hours average/year.

I found this on the web :
mean(HQ$2009|HQ$heure == 2, na.rm = TRUE)
[1] 1
Which 1 is not a possible value.

After digging on the web (conditional mean) I found the tidyverse, but I don't know if it's really this that I want :

hours_groups <- group_by(test_2, heure)
summarise(hours_groups, c_count_mean = mean(hours_groups, na.rm = TRUE), n = n())
The warning : In mean.default(hours_groups, na.rm = TRUE) :
argument is not numeric or logical: returning NA

I am a bit lost, because I would have been able to do it in 4 lines in stata. (with a for & if loop) :
for(h in hours) {
if (h == HQ$heure) {
mean(HQ$2009, na.rm = TRUE)
}
}

Wich is not working, but it was my first try since it was my stata's intuitions.

This is how I would find the means by hour. I assume the values in the columns that look like years are consumption.

library(tidyverse)

# Making some data up

HQ <- tibble(heure=rep(1:12, 31),
             jour=rep(1:31, each=12),
             mois=1) %>%
  mutate(`2009`=rnorm(n(), 23000, 300),
         `2010`=rnorm(n(), 23000, 300),
         `2011`=rnorm(n(), 23000, 300),
         `2012`=rnorm(n(), 23000, 300),
         `2013`=rnorm(n(), 23000, 300))

head(HQ)
#> # A tibble: 6 x 8
#>   heure  jour  mois `2009` `2010` `2011` `2012` `2013`
#>   <int> <int> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     1     1 22922. 23011. 23275. 22538. 22939.
#> 2     2     1     1 23250. 22667. 22946. 23106. 22900.
#> 3     3     1     1 22730. 22576. 23455. 22340. 22569.
#> 4     4     1     1 23069. 22894. 23300. 23335. 23110.
#> 5     5     1     1 22664. 23503. 22624. 22586. 23395.
#> 6     6     1     1 23202. 23630. 22753. 23045. 22400.
tail(HQ)
#> # A tibble: 6 x 8
#>   heure  jour  mois `2009` `2010` `2011` `2012` `2013`
#>   <int> <int> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1     7    31     1 22684. 22803. 22764. 22967. 23794.
#> 2     8    31     1 23115. 23217. 23167. 22922. 23134.
#> 3     9    31     1 23054. 22809. 22785. 23491. 23015.
#> 4    10    31     1 23195. 23087. 23380. 23233. 23090.
#> 5    11    31     1 23110. 22733. 23053. 22773. 22764.
#> 6    12    31     1 22911. 23056. 23184. 22862. 23655.

HQ %>% 
  group_by(heure) %>% # summarise by each hour
  summarise(mean2009=mean(`2009`),
            mean2010=mean(`2010`),
            mean2011=mean(`2011`),
            mean2012=mean(`2012`),
            mean2013=mean(`2013`)
            )
#> # A tibble: 12 x 6
#>    heure mean2009 mean2010 mean2011 mean2012 mean2013
#>    <int>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#>  1     1   22924.   23033.   23029.   22861.   22996.
#>  2     2   23010.   22971.   23065.   23017.   23049.
#>  3     3   23078.   22915.   22977.   22908.   23027.
#>  4     4   23020.   22995.   23084.   22948.   22939.
#>  5     5   22955.   23062.   23024.   22958.   23098.
#>  6     6   22869.   23005.   22945.   23045.   22935.
#>  7     7   22915.   22974.   22986.   22993.   23052.
#>  8     8   22972.   23010.   22949.   22976.   23053.
#>  9     9   23013.   23014.   22921.   23041.   23057.
#> 10    10   22994.   22968.   22989.   22996.   23042.
#> 11    11   23069.   22906.   23025.   22984.   22910.
#> 12    12   22934.   23005.   23097.   22980.   23015.

Created on 2020-05-07 by the reprex package (v0.3.0)

2 Likes

In fact, it was the Tidyverse. I was stuck with it. Hence I read : https://r4ds.had.co.nz/
within an hour I found the simple one command that I needed.

For every begginner that aim to do data description, only one night of reading will make you save hours.

This was the lines :

library(tidyverse)
library(dplyr)

X2009_2018_heure <- read_excel("C:Dropbox/Projet Éolien/Data/HQ/2009_2018_heure.xlsx")

HQ <- X2009_2018_heure %>%
pivot_longer(c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2018), names_to = "Year", values_to = "Consumption")

Hence After I can subdivise it as much as I want:

HQ_H_Y <- HQ %>%
group_by(Year,heure) %>%
summarise(hours_mean = mean(Consumption, na.rm = TRUE))

This one gives :
Year heure hours_mean

2009 1 17994.
2009 2 17828.
2009 3 17815.

HQ_H_M_Y_Q_1 <- HQ %>%
group_by(mois,heure,Year) %>%
dplyr::summarise( Quantile1 = quantile(Consumption, c(.10), na.rm = TRUE),
Quantile2 = quantile(Consumption, c(.20), na.rm = TRUE),
Quantile3 = quantile(Consumption, c(.30), na.rm = TRUE),
Quantile4 = quantile(Consumption, c(.40), na.rm = TRUE),
Quantile5 = quantile(Consumption, c(.50), na.rm = TRUE),
Quantile6 = quantile(Consumption, c(.60), na.rm = TRUE),
Quantile7 = quantile(Consumption, c(.70), na.rm = TRUE),
Quantile8 = quantile(Consumption, c(.80), na.rm = TRUE),
Quantile9 = quantile(Consumption, c(.90), na.rm = TRUE),
Quantile10 = quantile(Consumption, c(1.00), na.rm = TRUE))

This one is more complex :

mois heure Year Quantile1 Quantile2 Quantile3 Quantile4 Quantile5 Quantile6 Quantile7 Quantile8 Quantile9 Quantile10
1 1 2009 24089 24386 24793 24967 25226 25937 26586 27767 28104 30374

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.