Obtaining different Ljung-Box statistics for the same residuals

The example below is from the FPP3 textbook section on Seasonal ARIMA models. The value for the Ljung-Box statistic, lb1, is the same as that in the textbook. However, when I extract the residuals directly using the residuals function and calculate the Ljung-Box statistic lb2, the values are different. If I apply the log transformation before computing the residuals, I get values equivalent to lb2 whether I used the augment function (lb3) or if I extract the residuals (lb4). Why is the value for lb1 different to the other values I obtained?

In addition, the residual diagnostics page recommended using lag=2*m for the ljung_box function, unless 2*m is particularly large (where m is the seasonal period). However, here we use 36 instead of 24. Is there a particular reason for this?

library(fpp3)

h02 <- PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost)/1e6)
fit <- h02 %>%
  model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))

(lb1 <- augment(fit) %>%
  features(.resid, ljung_box, lag = 36, dof = 6))
#> # A tibble: 1 x 3
#>   .model                                             lb_stat lb_pvalue
#>   <chr>                                                <dbl>     <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    57.9   0.00163

(lb2 <- ljung_box(residuals(fit)$.resid, lag = 36, dof = 6))
#     lb_stat   lb_pvalue 
#50.71198622  0.01044742 

logh02 <- h02 %>%
  mutate(Cost=log(Cost))
fit2 <- logh02 %>%
  model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
(lb3 <- augment(fit2) %>%
  features(.resid, ljung_box, lag = 36, dof = 6))
# # A tibble: 1 x 3
# .model                                        lb_stat lb_pvalue
# <chr>                                           <dbl>     <dbl>
#   1 ARIMA(Cost ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    50.7    0.0104

(lb4 <- ljung_box(residuals(fit2)$.resid, lag = 36, dof = 6))
#     lb_stat   lb_pvalue 
#50.71198622  0.01044742 

Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos

base::residuals isn't the right function here

suppressPackageStartupMessages({
  library(dplyr)
  library(fpp3)
})

h02 <- PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost)/1e6)

fit <- h02 %>%
  model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))

augment(fit) %>%
  features(.resid, ljung_box, lag = 36, dof = 6)
#> # A tibble: 1 x 3
#>   .model                                             lb_stat lb_pvalue
#>   <chr>                                                <dbl>     <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    57.9   0.00163

ljung_box(augment(fit)$.resid, lag = 36, dof = 6)
#>      lb_stat    lb_pvalue 
#> 57.917489215  0.001633128

Created on 2021-01-05 by the reprex package (v0.3.0.9001)

1 Like

As @technocrat has pointed out, base::residuals() is getting different results. This is because of the log transformation used in the model. residuals(fit) will return the residuals from the ARIMA model applied to log(Cost), whereas augment(fit)$.resid will return the difference between the original observations (not logged) and the fitted value. The residuals from the ARIMA model are also returned as augment(fit)$.innov.

library(fpp3)

h02 <- PBS %>%
  filter(ATC2 == "H02") %>%
  summarise(Cost = sum(Cost)/1e6)
fit <- h02 %>%
  model(ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))

augment(fit) %>%
  features(.resid, ljung_box, lag = 36, dof = 6)
#> # A tibble: 1 x 3
#>   .model                                             lb_stat lb_pvalue
#>   <chr>                                                <dbl>     <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    57.9   0.00163

augment(fit) %>% 
  features(.innov, ljung_box, lag = 36, dof = 6)
#> # A tibble: 1 x 3
#>   .model                                             lb_stat lb_pvalue
#>   <chr>                                                <dbl>     <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    50.7    0.0104

residuals(fit) %>%  
  features(.resid, ljung_box, lag = 36, dof = 6)
#> # A tibble: 1 x 3
#>   .model                                             lb_stat lb_pvalue
#>   <chr>                                                <dbl>     <dbl>
#> 1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    50.7    0.0104

Created on 2021-01-07 by the reprex package (v0.3.0)

I've now updated the book to explain this distinction and to consistently use the innovation residuals when doing Ljung-Box tests.

3 Likes

Thanks, I appreciated learning about the distinction!

lag = 2*m is providing a fairly different result to lag = 3*m. Is there a way to reconcile the different conclusions?

augment(fit) %>%
  features(.innov, ljung_box, lag = 24, dof = 6)
#  .model                                             lb_stat lb_pvalue
#  <chr>                                                <dbl>     <dbl>
#1 ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))    23.7     0.166

I'm pretty sure that different lags for a time series will necessarily result in different ljung_box and that might be exaggerated by a series what requires differencing.

suppressPackageStartupMessages({
 library(dplyr)
 library(fpp3)
})

set.seed(137)
ljung_box(rnorm(100),lags = 24)
#>    lb_stat  lb_pvalue 
#> 0.03400665 0.85369267
ljung_box(rnorm(100),lags = 36)
#>   lb_stat lb_pvalue 
#> 0.3918684 0.5313189

unitroot_ndiffs(rnorm(100))
#> ndiffs 
#>      0

h02 <- PBS %>%
 filter(ATC2 == "H02") %>%
 summarise(Cost = sum(Cost)/1e6)

unitroot_ndiffs(rnorm(h02))
#> ndiffs 
#>      1

Created on 2021-01-07 by the reprex package (v0.3.0.9001)

Thanks @technocrat. I modelled the h02 dataset to ARIMA(2,1,3)(0,1,1)[12], which is what ARIMA() suggests with stepwise=FALSE. Even with first differencing (and seasonal differencing) accounted for, we would get different conclusions at the 5% significance level depending on whether we used lag 24 or 36. @robjhyndman Would it make sense to study how volatile this p-value is depending on the lag?

fit <- h02 %>% model(mod = ARIMA(log(Cost),stepwise=FALSE))

augment(fit) %>%
  features(.innov, ljung_box, lag=24, dof=6)
# # A tibble: 1 x 3
# .model lb_stat lb_pvalue
# <chr>    <dbl>     <dbl>
#   1 mod       21.5     0.255

augment(fit) %>%
  features(.innov, ljung_box, lag=36, dof=6)
# # A tibble: 1 x 3
# .model lb_stat lb_pvalue
# <chr>    <dbl>     <dbl>
#   1 mod       46.1    0.0301

The reason it becomes significant for lag 36 but not 24 are the spikes in the ACF after lag 24. They are unlikely to have much effect on the forecasts and can probably be ignored. See https://robjhyndman.com/hyndsight/ljung-box-test/ for a discussion about the effect of series length and number of lags on the p-value.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.