Hello,
I am a self-taught R user and I have a 2 part question.
Here is a small snippet of the data that I am working on:
library(tidyverse)
df <- tibble(date = c("2018-10-27", "2018-10-26", "2018-10-26", "2018-10-26", "2018-10-25", "2018-10-25"),
value = c(2, 3, 4, 5, 4, 7))
df <- df %>%
mutate(date = as.Date(date))
df
> df
# A tibble: 6 x 2
date value
<date> <dbl>
1 2018-10-27 2.00
2 2018-10-26 3.00
3 2018-10-26 4.00
4 2018-10-26 5.00
5 2018-10-25 4.00
6 2018-10-25 7.00
Notice that there are multiple duplicates for the date column. I am working with daily data. On some dates, there are several entries. On other dates there may be only one or none.
- I want to calculate a 90 row moving average
Normally, when I calculate moving averages, I usually use something like this (example with a 3-row moving average):
moving_avg_3_rows <- function(x){
(lag(x,2) + lag(x,1) + x)/3
}
df_1 <- df %>%
mutate(value_2 = moving_avg_3_rows(value))
df_1
> df_1
# A tibble: 6 x 3
date value value_2
<date> <dbl> <dbl>
1 2018-10-27 2.00 NA
2 2018-10-26 3.00 NA
3 2018-10-26 4.00 3.00
4 2018-10-26 5.00 4.00
5 2018-10-25 4.00 4.33
6 2018-10-25 7.00 5.33
However, if i want to calculate a 90-row moving average, it seems really clumsy to have to type out "lag(x,n)" all the way from 89 to 1. Is there a way to write a function that goes up to "n" for the rolling averages, without having to type them all out by hand? I also want to make "n" customizable.
I tried a few clumsy for loops, but cannot seem to figure this out.
- I want to calculate a 90 day moving average
The duplicates dates in my data are creating a problem. I tried using some of the functions from the tidyquant package. From what I can understand, the rolling functions cannot be used here because the zoo package requires unique date information as the index.
(side note: I am using this site as a reference https://www.business-science.io/timeseries-analysis/2017/07/23/tidy-timeseries-analysis-pt-2.html)
Based on my understanding, it looks like I have two options:
a) Summarize/collapse the duplicate dates into one date and use that towards the moving average.
b) Introduce some random noise into the date index (e.g. by adding hours, minutes and seconds) so that each duplicate day can then be converted into a unique date/time stamp.
I am not sure how I would go about either of those two routes. Any help would be really appreciated!
Thanks in advance.