I'm guessing there are better ways to build model formulas, but here's one option: We create a function that builds the formula as a string and then turns it into a formula object at the end:
library(tidyverse)
# Fake data
set.seed(2)
dat = tibble(y = 1:20,
x = cumsum(rnorm(20)),
z = cumsum(rnorm(20)),
w = rnorm(20),
r = rnorm(20)) %>%
mutate(xlag1 = lag(x,1),
xlag2 = lag(x,2),
zlag1 = lag(z,1),
zlag2 = lag(z,2))
my_lag = function(DV, lag.vars, max.lag, other.vars=NULL) {
form = paste(paste0("lag(", rep(lag.vars, each=max.lag + 1), ", ", 0:max.lag, ")"), collapse=" + ")
form = paste(DV, " ~ ", form)
as.formula(paste(form, paste(other.vars, collapse=" + "), sep=" + "))
}
Here's an example of the function output:
my_lag("y", c("x","z"), 2, c("w","r"))
y ~ lag(x, 0) + lag(x, 1) + lag(x, 2) + lag(z, 0) + lag(z, 1) +
lag(z, 2) + w + r
An example of running the model for multiple maximum lags.
models = map(0:3,
~ my_lag("y", c("x","z"), .x, c("w","r")) %>% lm(data=dat))
models
List of model results for each lag
$`max lag=0`
Call:
lm(formula = ., data = dat)
Coefficients:
(Intercept) lag(x, 0) lag(z, 0) w r
2.1695 -0.4682 1.9575 -0.1563 0.7344
$`max lag=1`
Call:
lm(formula = ., data = dat)
Coefficients:
(Intercept) lag(x, 0) lag(x, 1) lag(z, 0) lag(z, 1) w
3.015369 0.016751 -0.309248 1.158572 0.867487 0.004935
r
0.506720
$`max lag=2`
Call:
lm(formula = ., data = dat)
Coefficients:
(Intercept) lag(x, 0) lag(x, 1) lag(x, 2) lag(z, 0) lag(z, 1)
3.26021 -0.63063 0.93202 -0.34591 2.08166 -1.71304
lag(z, 2) w r
1.99496 -0.05725 0.58461
$`max lag=3`
Call:
lm(formula = ., data = dat)
Coefficients:
(Intercept) lag(x, 0) lag(x, 1) lag(x, 2) lag(x, 3) lag(z, 0)
3.54903 -0.47732 0.40362 0.24174 0.09897 1.79321
lag(z, 1) lag(z, 2) lag(z, 3) w r
-0.54183 -0.25646 1.76956 0.19338 -0.19972
Check that we get the same coefficients as in your example.
model.check = lm(y ~ ., data=dat)
models[[3]]
model.check
> models[[3]]
Call:
lm(formula = ., data = dat)
Coefficients:
(Intercept) lag(x, 0) lag(x, 1) lag(x, 2) lag(z, 0) lag(z, 1)
3.26021 -0.63063 0.93202 -0.34591 2.08166 -1.71304
lag(z, 2) w r
1.99496 -0.05725 0.58461
> model.check
Call:
lm(formula = y ~ ., data = dat)
Coefficients:
(Intercept) x z w r xlag1
3.26021 -0.63063 2.08166 -0.05725 0.58461 0.93202
xlag2 zlag1 zlag2
-0.34591 -1.71304 1.99496