My goal to understand the forecast::forecast
function's inner workings and manually verify the forecast calculations. To achieve this, I first generate data from an AR(2) process.
Generate data
set.seed(123)
# Generate data
y = vector(length = 48)
y[1] = 10
y[2] = 10
for (i in 3:48) {
y[i] = 5 + 1.0*y[i-1] - 0.70*y[i-2] + rnorm(n = 1, sd = 1)
}
# Save it as a time-series object
ts <- as.ts(y, frequency = 1)
# Check data and its ACF (if desired)
# par(mfrow = c(1,2))
# plot(ts, main = "Time series Y", ylab = "Values")
# Acf(ts, main = "ACF for time series Y")
# par(mfrow=c(1,1))
ARIMA modeling
To model the data, I applied forecast::auto.arima
, which correctly fitted an AR(2) process with \phi_1 and \phi_2 values of 0.9380345 and -0.7773760, respectively. Additionally, an intercept y_0 of 7.3359679 was included. It is important to note that I used only half of the data points to fit the model.
# Fit ARIMA model and check model
fit <- auto.arima(y = ts[1:24], seasonal = FALSE)
summary(fit)
plot(fit)
The final model is
where \epsilon is assumed to be a sampled innovation with known mean and variance.
Forecasting
Using the model fit
, I forecasted 24 steps ahead. For this analysis, I want to verify the first forecast. Assuming t = 24, the first one-step-ahead forecast is y_{25}. The code to obtain the forecasts is as follows:
# One-step-ahead forecast
fit_forecast = forecast(fit, h = 24)
plot(fit_forecast)
Adapting the equation above to this forecast at hand, we have
To calculate this initial forecast, I assume the forecast
function relies on the x values within the model fit
, i.e., fit$x
. These values represent the data on which the model was trained. Looking up y_{24} and y_{23}, I find their respective values are 6.049662 and 5.779341. Therefore, to manually calculate y_{25}, I should use the formula:
When I check the forecasts in fit_forecast$mean
, I see that the first value, y_{25}, is 7.339453. To manually verify this value, I need to know the value sampled for \epsilon_{t+1}. However, I couldn't find this value anywhere in fit_forecast
.
One approach is to calculate \epsilon_{t+1} as the difference between the forecasted y_{25} and the expected value without the error term, 8.518039. This method seems counter-intuitive. I assume that forecast::forecast
must have sampled a value for \epsilon_{t+1}, but it does not seem to keep a history of it.
If the mean of the innovation is zero, we would have
I did inspect forecast$fitted
, but the first value is 8.576613, different from fit_forecast$mean[1]
.
Does forecast::forecast
not provide a history of sampled innovations or have I missed something?