How to double-check a prediction from `forecast::forecast`?

HugoSantarem · August 7, 2024, 11:44pm

My goal to understand the forecast::forecast function's inner workings and manually verify the forecast calculations. To achieve this, I first generate data from an AR(2) process.

Generate data

set.seed(123)

# Generate data
y = vector(length = 48)
y[1] = 10
y[2] = 10
for (i in 3:48) {
  y[i] = 5 + 1.0*y[i-1] - 0.70*y[i-2] +  rnorm(n = 1, sd = 1)
}

# Save it as a time-series object
ts <- as.ts(y, frequency = 1)
# Check data and its ACF (if desired)
# par(mfrow = c(1,2))
# plot(ts, main = "Time series Y", ylab = "Values")
# Acf(ts, main = "ACF for time series Y")
# par(mfrow=c(1,1))

ARIMA modeling

To model the data, I applied forecast::auto.arima, which correctly fitted an AR(2) process with \phi_1 and \phi_2 values of 0.9380345 and -0.7773760, respectively. Additionally, an intercept y_0 of 7.3359679 was included. It is important to note that I used only half of the data points to fit the model.

# Fit ARIMA model and check model
fit <- auto.arima(y = ts[1:24],  seasonal = FALSE)
summary(fit)
plot(fit)

The final model is

y_{t+1} = 7.3359679 + 0.9380345y_t - 0.7773760y_{t-1} + \epsilon_{t+1},

where \epsilon is assumed to be a sampled innovation with known mean and variance.

Forecasting

Using the model fit, I forecasted 24 steps ahead. For this analysis, I want to verify the first forecast. Assuming t = 24, the first one-step-ahead forecast is y_{25}. The code to obtain the forecasts is as follows:

# One-step-ahead forecast
fit_forecast = forecast(fit, h = 24)
plot(fit_forecast)

Adapting the equation above to this forecast at hand, we have

y_{25} = 7.3359679 + 0.9380345 \cdot y_{24} - 0.7773760 \cdot y_{23} + \epsilon_{t+1},

To calculate this initial forecast, I assume the forecast function relies on the x values within the model fit, i.e., fit$x. These values represent the data on which the model was trained. Looking up y_{24} and y_{23}, I find their respective values are 6.049662 and 5.779341. Therefore, to manually calculate y_{25}, I should use the formula:

y_{25} = 7.3359679 + 0.9380345 \cdot 6.049662 - 0.7773760 \cdot 5.779341 + \epsilon_{t+1} \\ y_{25} = 8.518039+ \epsilon_{t+1}

When I check the forecasts in fit_forecast$mean, I see that the first value, y_{25}, is 7.339453. To manually verify this value, I need to know the value sampled for \epsilon_{t+1}. However, I couldn't find this value anywhere in fit_forecast.

One approach is to calculate \epsilon_{t+1} as the difference between the forecasted y_{25} and the expected value without the error term, 8.518039. This method seems counter-intuitive. I assume that forecast::forecast must have sampled a value for \epsilon_{t+1}, but it does not seem to keep a history of it.

If the mean of the innovation is zero, we would have

\mathbb{E}[y_{25}] = \mathbb{E}[8.518039+ \epsilon_{t+1}] \\ y_{25} = 8.518039

I did inspect forecast$fitted, but the first value is 8.576613, different from fit_forecast$mean[1].

Does forecast::forecast not provide a history of sampled innovations or have I missed something?

robjhyndman · August 8, 2024, 12:25am

I've already responded to this at forecasting - How to validate the predictions from the function forecast in the R? - Cross Validated
Rather than just re-post your questions, please take the time to read the suggested links. First check the parameterisation being used: 8.7 ARIMA modelling in R | Forecasting: Principles and Practice (2nd ed) Then read how to manually produce forecasts at 8.8 Forecasting | Forecasting: Principles and Practice (2nd ed)

As for the innovations, they are contained in the fitted model. Use residuals(fit).

HugoSantarem · August 9, 2024, 11:22pm

Thanks to A. Check in the CrossValidated forum, I could see the error in my derivation. After that was cleared up, it was easy to follow through the calculations to reach the same result as forecast::forecast. To see the answer, click on this link: forecasting - How to validate the predictions from the function forecast in the R? - Cross Validated

system · August 16, 2024, 11:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.