 # Weighted Average Over Time Series

Hi Everyone,

I'm currently trying to calculate a weighted average using dplyr on a time series every 12 hours. I've writte code that seems to work properly for a normal arithmetic mean. Seen here:

``````AverageMet12<-Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
summarise_if(is.numeric, mean, na.rm = TRUE)
``````

However, I want to calculated a weighted average, so I would assume that this code would work but it doesn't

``````
AverageMet12<-Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
summarise_if(is.numeric, weighted.mean(w = CLOUD_HR), na.rm = TRUE)
``````

I get there error:

Is there a better what of doing this?

My variable names include "CLOUD_HR", "LWC", "PSA", and "WDR". I know I have the spelling of my variables correct.

The issue is that when you use summarise_ or mutate_ you cannot call variables by name in the functions. Does every group, FullDate, have the same length and weights? If so, you could save the weights in a vector and pass to the summarise_if function:

``````weight_vec <-  # save the weights here
AverageMet12<-Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
summarise_if(is.numeric, weighted.mean, w = weight_vec, na.rm = TRUE)
``````

I appreciate the quick response. They all do have the same length within in the dataframe. However, when I run this code, I get this error when setting the weighted vector up outside the pipes like you did:

Error: Problem with `summarise()` input `LWC`.
x 'x' and 'w' must have the same length

I think this is due to the collapse_by function creating vector lengths that are different than the weight vector, but I could be wrong about that.

Ok that means that no, the FullDate groups do not have the same length. Do you need this to be generalized for any reason? For a generalized solution, I think that using something in the purrr package would be the best.

If you don't need it to be generalized, I would just type out the functions in a summarize:

``````AverageMet12<-Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
summarise(LWC = weighted.mean(LWC, w = CLOUD_HR, na.rm = TRUE))
``````

You might also want to see how many observations you have for each FullDate in summarize by adding n = n()

I'll look more into solving this with purrr (something I haven't tried yet). However, the other solution you give here only gives me the weighted average for the entire data set, not for each group.

Ok, I suggest you look at the output of your code one "pipe" at a time. If the group_by(FullDate) produces only one row, then you only have one group in FullDate. What is the output of:

``````Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
summarise(n = n())
``````

The following is a MWE for calculating the weighted mean of a column based on another column after collapsing by a time period.

``````library(tidyverse)
library(tibbletime)

data(FB)
FB <- as_tbl_time(FB, date)

# Collapse to weekly dates
collapse_by(FB, "weekly") %>%
group_by(date) %>%
summarize(open = weighted.mean(high,w = close))
``````

Woah, I'm a bit embarrased. I figured out the issue. R was thinking I was using the plyr summarise function rather than the dplyr summarise function. This code worked the way I was looking for.

``````AverageMet12<- Hour12Met%>%
filter(LWC >= 0.05)%>%
collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
group_by(FullDate)%>%
dplyr::summarise(LWCweight = weighted.mean(LWC, w = CLOUD_HR))

``````

Specfiying dplyr::summarise seemed to fix the code. Thank you for help and patience.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.