# How to make "data-dependent" predictions for an ARIMA model in R?

I want to estimate ARIMA models and use them to make predictions following two different paradigms.

One I will call self-feeding and the other data-dependent. The self-feeding prediction, which I know to implement, feeds the predictions into the model without the need to rely on past data, except for the first \max\{p,q\} values, where p is the autoregressive order and q is the moving-average order. The data-dependent, which I am not sure how to implement, does not self-feed into the the model. Instead, it keeps being fed data the user has.

The self-feeding paradigm allows the forecasts to "roam freer" whereas the data-dependent paradigm has a more corrective nature. In fact, one can think of the data-dependent paradigm as a way to keep the forecasts informed at each one-step-ahead prediction.

Here's a bit of code to help drive the point home. In this first naïve example, I expect the self-feeding prediction to output a sequence that starts from the last training data point, in this case, 30, and to follow closely from it, i.e., {31, 32, 33, ...}. I know how to do this with forecast::forecast.

# Generate some data
set.seed(123)

# Suppose I have a a dataset D = {1, 2, 3, ..., 60}
# but choose to train the model on only the first half of it.
x = 1:30
uncertainty = runif(n = length(x))
y = x + uncertainty


Now, I fit a model and make a prediction of 10 steps ahead. The output of predictions$mean shows a forecast sequence that closely follows the results I expected. library(forecast) fit <- forecast::auto.arima(y, seasonal = FALSE)  Let's inspect the model. As one can see, the model has a drift, as expected, and two autoregressive coefficients. summary(fit) # Series: y # ARIMA(2,1,0) with drift # # Coefficients: # ar1 ar2 drift # -0.6302 -0.4995 0.9938 # s.e. 0.1617 0.1590 0.0282 # # sigma^2 = 0.1102: log likelihood = -7.97 # AIC=23.93 AICc=25.6 BIC=29.4 # # Training set error measures: # ME RMSE MAE MPE MAPE MASE # Training set 0.006437989 0.3089803 0.2553583 0.4930526 2.514604 0.2566012 # ACF1 # Training set -0.05921137  Finally, let's make a self-feeding prediction. predictions <- forecast(fit) predictions$mean
# Time Series:
# Start = 31
# End = 40
# Frequency = 1
# [1] 31.37578 32.28944 33.21643 34.29237
# [5] 35.26779 36.23214 37.25369 38.24472
# [9] 39.22641 40.22923


Now, suppose I try to use the model under the data-dependent paradigm. This time we don't want the model to be self-fed the predictions it outputs. This time, I want to use the data I have but didn't use to train the model, {31, 32, ..., 60}, For the first prediction, I feed the model the data points 29 and 30 to get the one-step-ahead forecast, t+1, from the forecast horizon, T.

x^T_{t+1} = 30\phi_1 + 29\phi_2 + \eta = y_1

where \eta is a random realization of known mean and variance, and \phi is an autoregressive coefficient. The following forecast

x^{t+1}_{t+2} = 31\phi_1 + 30\phi_2 + \eta = y_2

and so on. Note that if we had used a self-feeding paradigm, the previous equation would be x^{t+1}_{t+2} = y_1\phi_1 + 30\phi_2 + \eta, where y_1 is not necessarily equal to 31.

Is there any function either from a package or base R that does this job?

I know the function predict does it with the argument newdata, but I am not sure this would work for an ARIMA model because there is, per se, an independent variable to be fed into the model.