How to make "data-dependent" predictions for an ARIMA model in R?

HugoSantarem · August 6, 2024, 6:06pm

I want to estimate ARIMA models and use them to make predictions following two different paradigms.

One I will call self-feeding and the other data-dependent. The self-feeding prediction, which I know to implement, feeds the predictions into the model without the need to rely on past data, except for the first \max\{p,q\} values, where p is the autoregressive order and q is the moving-average order. The data-dependent, which I am not sure how to implement, does not self-feed into the the model. Instead, it keeps being fed data the user has.

The self-feeding paradigm allows the forecasts to "roam freer" whereas the data-dependent paradigm has a more corrective nature. In fact, one can think of the data-dependent paradigm as a way to keep the forecasts informed at each one-step-ahead prediction.

Here's a bit of code to help drive the point home. In this first naïve example, I expect the self-feeding prediction to output a sequence that starts from the last training data point, in this case, 30, and to follow closely from it, i.e., {31, 32, 33, ...}. I know how to do this with forecast::forecast.

# Generate some data
set.seed(123)

# Suppose I have a a dataset D = {1, 2, 3, ..., 60} 
# but choose to train the model on only the first half of it.
x = 1:30
uncertainty = runif(n = length(x))
y = x + uncertainty

Now, I fit a model and make a prediction of 10 steps ahead. The output of predictions$mean shows a forecast sequence that closely follows the results I expected.

library(forecast)
fit <- forecast::auto.arima(y, seasonal = FALSE)

Let's inspect the model. As one can see, the model has a drift, as expected, and two autoregressive coefficients.

summary(fit)
# Series: y 
# ARIMA(2,1,0) with drift 
# 
# Coefficients:
#           ar1      ar2   drift
#       -0.6302  -0.4995  0.9938
# s.e.   0.1617   0.1590  0.0282
# 
# sigma^2 = 0.1102:  log likelihood = -7.97
# AIC=23.93   AICc=25.6   BIC=29.4
# 
# Training set error measures:
#                       ME      RMSE       MAE       MPE     MAPE      MASE
# Training set 0.006437989 0.3089803 0.2553583 0.4930526 2.514604 0.2566012
#                     ACF1
# Training set -0.05921137

Finally, let's make a self-feeding prediction.

predictions <- forecast(fit)
predictions$mean
# Time Series:
# Start = 31 
# End = 40 
# Frequency = 1 
# [1] 31.37578 32.28944 33.21643 34.29237
# [5] 35.26779 36.23214 37.25369 38.24472
# [9] 39.22641 40.22923

Now, suppose I try to use the model under the data-dependent paradigm. This time we don't want the model to be self-fed the predictions it outputs. This time, I want to use the data I have but didn't use to train the model, {31, 32, ..., 60}, For the first prediction, I feed the model the data points 29 and 30 to get the one-step-ahead forecast, t+1, from the forecast horizon, T.

x^T_{t+1} = 30\phi_1 + 29\phi_2 + \eta = y_1

where \eta is a random realization of known mean and variance, and \phi is an autoregressive coefficient. The following forecast

x^{t+1}_{t+2} = 31\phi_1 + 30\phi_2 + \eta = y_2

and so on. Note that if we had used a self-feeding paradigm, the previous equation would be x^{t+1}_{t+2} = y_1\phi_1 + 30\phi_2 + \eta, where y_1 is not necessarily equal to 31.

Is there any function either from a package or base R that does this job?

I know the function predict does it with the argument newdata, but I am not sure this would work for an ARIMA model because there is, per se, an independent variable to be fed into the model.

system · November 4, 2024, 6:06pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.