The example below is from the FPP3 textbook section on Seasonal ARIMA models. The value for the Ljung-Box statistic, lb1
, is the same as that in the textbook. However, when I extract the residuals directly using the residuals
function and calculate the Ljung-Box statistic lb2
, the values are different. If I apply the log transformation before computing the residuals, I get values equivalent to lb2
whether I used the augment
function (lb3
) or if I extract the residuals (lb4
). Why is the value for lb1
different to the other values I obtained?
In addition, the residual diagnostics page recommended using lag=2*m
for the ljung_box function
, unless 2*m
is particularly large (where m
is the seasonal period). However, here we use 36 instead of 24. Is there a particular reason for this?
library(fpp3)
h02 <- PBS %>%
filter(ATC2 == "H02") %>%
summarise(Cost = sum(Cost)/1e6)
fit <- h02 %>%
model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
(lb1 <- augment(fit) %>%
features(.resid, ljung_box, lag = 36, dof = 6))
#> # A tibble: 1 x 3
#> .model lb_stat lb_pvalue
#> <chr> <dbl> <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2)) 57.9 0.00163
(lb2 <- ljung_box(residuals(fit)$.resid, lag = 36, dof = 6))
# lb_stat lb_pvalue
#50.71198622 0.01044742
logh02 <- h02 %>%
mutate(Cost=log(Cost))
fit2 <- logh02 %>%
model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
(lb3 <- augment(fit2) %>%
features(.resid, ljung_box, lag = 36, dof = 6))
# # A tibble: 1 x 3
# .model lb_stat lb_pvalue
# <chr> <dbl> <dbl>
# 1 ARIMA(Cost ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2)) 50.7 0.0104
(lb4 <- ljung_box(residuals(fit2)$.resid, lag = 36, dof = 6))
# lb_stat lb_pvalue
#50.71198622 0.01044742
Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos