Hi there,
I want to calculate the maximum lag value of the eight quarters before. Since my solution is long and wrong, I wanted to ask you people for some ideas.
library(dplyr)
So here is a alternated dataset of one id. As you can see I do not have information for every quarter for every id. In this case 1993/4 is missing.
df <- tibble::tibble(
id = c(rep(1,14)),
year = c(rep(1990,4),rep(1991,4),rep(1992,3),rep(1993,3)),
quarter = c(rep(seq(1:4),2),rep(seq(1:3),2)),
var1 = c(rep(1,14)),
var2 = c(rep(0,2),rep(1,12)),
var3 = c(rep(0,14)),
var4 = c(rep(0,14)),
var5 = c(rep(0,10),1,0,0,1),
quarter_num = c(1:11,13:15)
)
I am interested if var1:var5 (will build the loop later) occurred in the past 8 quarters, but I can´t get my head around how to handle the missing quarter correctly:
df2 <- df %>% group_by(id) %>%
mutate(lag8 = ifelse(quarter_num-lag(quarter_num,8)-8 <= 0 | is.na(quarter_num-lag(quarter_num,8)-8) ,lag(var2,8),NA),
lag7 = ifelse(quarter_num-lag(quarter_num,7)-7 <= 0 | is.na(quarter_num-lag(quarter_num,7)-7) ,lag(var2,7),NA),
lag6 = ifelse(quarter_num-lag(quarter_num,6)-6 <= 0 | is.na(quarter_num-lag(quarter_num,6)-6) ,lag(var2,6),NA),
lag5 = ifelse(quarter_num-lag(quarter_num,5)-5 <= 0 | is.na(quarter_num-lag(quarter_num,5)-5) ,lag(var2,5),NA),
lag4 = ifelse(quarter_num-lag(quarter_num,4)-4 <= 0 | is.na(quarter_num-lag(quarter_num,4)-4) ,lag(var2,4),NA),
lag3 = ifelse(quarter_num-lag(quarter_num,3)-3 <= 0 | is.na(quarter_num-lag(quarter_num,3)-3) ,lag(var2,3),NA),
lag2 = ifelse(quarter_num-lag(quarter_num,2)-2 <= 0 | is.na(quarter_num-lag(quarter_num,2)-2) ,lag(var2,2),NA),
lag1 = ifelse(quarter_num-lag(quarter_num,1)-1 <= 0 | is.na(quarter_num-lag(quarter_num,1)-1) ,lag(var2,1),NA),
lag_var2 = pmax(lag1,lag2,lag3,lag4,lag5,lag6,lag7,lag8, na.rm = TRUE))
Any ideas or dplyr magic?
Thank you!