I want to estimate ARIMA models and use them to make predictions following two different paradigms.
One I will call self-feeding and the other data-dependent. The self-feeding prediction, which I know to implement, feeds the predictions into the model without the need to rely on past data, except for the first \max\{p,q\} values, where p is the autoregressive order and q is the moving-average order. The data-dependent, which I am not sure how to implement, does not self-feed into the the model. Instead, it keeps being fed data the user has.
The self-feeding paradigm allows the forecasts to "roam freer" whereas the data-dependent paradigm has a more corrective nature. In fact, one can think of the data-dependent paradigm as a way to keep the forecasts informed at each one-step-ahead prediction.
Here's a bit of code to help drive the point home. In this first naïve example, I expect the self-feeding prediction to output a sequence that starts from the last training data point, in this case, 30, and to follow closely from it, i.e., {31, 32, 33, ...}. I know how to do this with forecast::forecast
.
# Generate some data
set.seed(123)
# Suppose I have a a dataset D = {1, 2, 3, ..., 60}
# but choose to train the model on only the first half of it.
x = 1:30
uncertainty = runif(n = length(x))
y = x + uncertainty
Now, I fit a model and make a prediction of 10 steps ahead. The output of predictions$mean
shows a forecast sequence that closely follows the results I expected.
library(forecast)
fit <- forecast::auto.arima(y, seasonal = FALSE)
Let's inspect the model. As one can see, the model has a drift, as expected, and two autoregressive coefficients.
summary(fit)
# Series: y
# ARIMA(2,1,0) with drift
#
# Coefficients:
# ar1 ar2 drift
# -0.6302 -0.4995 0.9938
# s.e. 0.1617 0.1590 0.0282
#
# sigma^2 = 0.1102: log likelihood = -7.97
# AIC=23.93 AICc=25.6 BIC=29.4
#
# Training set error measures:
# ME RMSE MAE MPE MAPE MASE
# Training set 0.006437989 0.3089803 0.2553583 0.4930526 2.514604 0.2566012
# ACF1
# Training set -0.05921137
Finally, let's make a self-feeding prediction.
predictions <- forecast(fit)
predictions$mean
# Time Series:
# Start = 31
# End = 40
# Frequency = 1
# [1] 31.37578 32.28944 33.21643 34.29237
# [5] 35.26779 36.23214 37.25369 38.24472
# [9] 39.22641 40.22923
Now, suppose I try to use the model under the data-dependent paradigm. This time we don't want the model to be self-fed the predictions it outputs. This time, I want to use the data I have but didn't use to train the model, {31, 32, ..., 60}, For the first prediction, I feed the model the data points 29 and 30 to get the one-step-ahead forecast, t+1, from the forecast horizon, T.
where \eta is a random realization of known mean and variance, and \phi is an autoregressive coefficient. The following forecast
and so on. Note that if we had used a self-feeding paradigm, the previous equation would be x^{t+1}_{t+2} = y_1\phi_1 + 30\phi_2 + \eta, where y_1 is not necessarily equal to 31.
Is there any function either from a package or
base R
that does this job?
I know the function predict
does it with the argument newdata
, but I am not sure this would work for an ARIMA model because there is, per se, an independent variable to be fed into the model.