How to make "data-dependent" predictions for an ARIMA model in R?

I want to estimate ARIMA models and use them to make predictions following two different paradigms.

One I will call self-feeding and the other data-dependent. The self-feeding prediction, which I know to implement, feeds the predictions into the model without the need to rely on past data, except for the first \max\{p,q\} values, where p is the autoregressive order and q is the moving-average order. The data-dependent, which I am not sure how to implement, does not self-feed into the the model. Instead, it keeps being fed data the user has.

The self-feeding paradigm allows the forecasts to "roam freer" whereas the data-dependent paradigm has a more corrective nature. In fact, one can think of the data-dependent paradigm as a way to keep the forecasts informed at each one-step-ahead prediction.

Here's a bit of code to help drive the point home. In this first naïve example, I expect the self-feeding prediction to output a sequence that starts from the last training data point, in this case, 30, and to follow closely from it, i.e., {31, 32, 33, ...}. I know how to do this with forecast::forecast.

# Generate some data
set.seed(123)

# Suppose I have a a dataset D = {1, 2, 3, ..., 60} 
# but choose to train the model on only the first half of it.
x = 1:30
uncertainty = runif(n = length(x))
y = x + uncertainty 

Now, I fit a model and make a prediction of 10 steps ahead. The output of predictions$mean shows a forecast sequence that closely follows the results I expected.

library(forecast)
fit <- forecast::auto.arima(y, seasonal = FALSE) 

Let's inspect the model. As one can see, the model has a drift, as expected, and two autoregressive coefficients.

summary(fit)
# Series: y 
# ARIMA(2,1,0) with drift 
# 
# Coefficients:
#           ar1      ar2   drift
#       -0.6302  -0.4995  0.9938
# s.e.   0.1617   0.1590  0.0282
# 
# sigma^2 = 0.1102:  log likelihood = -7.97
# AIC=23.93   AICc=25.6   BIC=29.4
# 
# Training set error measures:
#                       ME      RMSE       MAE       MPE     MAPE      MASE
# Training set 0.006437989 0.3089803 0.2553583 0.4930526 2.514604 0.2566012
#                     ACF1
# Training set -0.05921137

Finally, let's make a self-feeding prediction.

predictions <- forecast(fit)
predictions$mean
# Time Series:
# Start = 31 
# End = 40 
# Frequency = 1 
# [1] 31.37578 32.28944 33.21643 34.29237
# [5] 35.26779 36.23214 37.25369 38.24472
# [9] 39.22641 40.22923

Now, suppose I try to use the model under the data-dependent paradigm. This time we don't want the model to be self-fed the predictions it outputs. This time, I want to use the data I have but didn't use to train the model, {31, 32, ..., 60}, For the first prediction, I feed the model the data points 29 and 30 to get the one-step-ahead forecast, t+1, from the forecast horizon, T.

x^T_{t+1} = 30\phi_1 + 29\phi_2 + \eta = y_1

where \eta is a random realization of known mean and variance, and \phi is an autoregressive coefficient. The following forecast

x^{t+1}_{t+2} = 31\phi_1 + 30\phi_2 + \eta = y_2

and so on. Note that if we had used a self-feeding paradigm, the previous equation would be x^{t+1}_{t+2} = y_1\phi_1 + 30\phi_2 + \eta, where y_1 is not necessarily equal to 31.

Is there any function either from a package or base R that does this job?

I know the function predict does it with the argument newdata, but I am not sure this would work for an ARIMA model because there is, per se, an independent variable to be fed into the model.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.