Weighted Average Over Time Series

Hi Everyone,

I'm currently trying to calculate a weighted average using dplyr on a time series every 12 hours. I've writte code that seems to work properly for a normal arithmetic mean. Seen here:

AverageMet12<-Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  summarise_if(is.numeric, mean, na.rm = TRUE)

However, I want to calculated a weighted average, so I would assume that this code would work but it doesn't


AverageMet12<-Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  summarise_if(is.numeric, weighted.mean(w = CLOUD_HR), na.rm = TRUE)

I get there error:
Error in weighted.mean(w = CLOUD_HR) : object 'CLOUD_HR' not found

Is there a better what of doing this?

My variable names include "CLOUD_HR", "LWC", "PSA", and "WDR". I know I have the spelling of my variables correct.

The issue is that when you use summarise_ or mutate_ you cannot call variables by name in the functions. Does every group, FullDate, have the same length and weights? If so, you could save the weights in a vector and pass to the summarise_if function:

weight_vec <-  # save the weights here 
AverageMet12<-Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  summarise_if(is.numeric, weighted.mean, w = weight_vec, na.rm = TRUE)

I appreciate the quick response. They all do have the same length within in the dataframe. However, when I run this code, I get this error when setting the weighted vector up outside the pipes like you did:

Error: Problem with summarise() input LWC.
x 'x' and 'w' must have the same length

I think this is due to the collapse_by function creating vector lengths that are different than the weight vector, but I could be wrong about that.

Ok that means that no, the FullDate groups do not have the same length. Do you need this to be generalized for any reason? For a generalized solution, I think that using something in the purrr package would be the best.

If you don't need it to be generalized, I would just type out the functions in a summarize:

AverageMet12<-Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  summarise(LWC = weighted.mean(LWC, w = CLOUD_HR, na.rm = TRUE))

You might also want to see how many observations you have for each FullDate in summarize by adding n = n()

I'll look more into solving this with purrr (something I haven't tried yet). However, the other solution you give here only gives me the weighted average for the entire data set, not for each group.

Ok, I suggest you look at the output of your code one "pipe" at a time. If the group_by(FullDate) produces only one row, then you only have one group in FullDate. What is the output of:

Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  summarise(n = n())

The following is a MWE for calculating the weighted mean of a column based on another column after collapsing by a time period.

library(tidyverse)
library(tibbletime)

data(FB)
FB <- as_tbl_time(FB, date)

# Collapse to weekly dates
collapse_by(FB, "weekly") %>% 
  group_by(date) %>% 
  summarize(open = weighted.mean(high,w = close))

Woah, I'm a bit embarrased. I figured out the issue. R was thinking I was using the plyr summarise function rather than the dplyr summarise function. This code worked the way I was looking for.

AverageMet12<- Hour12Met%>%
  filter(LWC >= 0.05)%>%
  collapse_by("12 hourly", Date = FullDate, start_date =  "2014-05-31 17:59:00")%>%
  group_by(FullDate)%>%
  dplyr::summarise(LWCweight = weighted.mean(LWC, w = CLOUD_HR))

Specfiying dplyr::summarise seemed to fix the code. Thank you for help and patience.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.