Augmented Dickey-Fuller test with AIC/BIC

sunny_oxford · March 17, 2022, 11:03am

Hi, I am using the Multiple Bubbles package to perform an ADF test with AIC/BIC to try to establish the autoregressive order of the data. I have run the following code, yet I struggle to understand the output:

ADF_IC(US1ytimeseries[,2], adflag = 10, mflag = 1, IC = 1)
$ADF Statistic using AIC
[1] -2.236572

Can anyone help with the intepretation of the result?
Thanks in advance!

gueyenono · March 17, 2022, 3:38pm

Hi @sunny_oxford ,

Are you familiar with statistical tests generally? The idea is that you would like to test whether there is good evidence for an hypothesis about your data. In your case, the hypothesis is that your time series has a unit root (i.e. it is nonstationary). So the value you obtained is the test statistic. You need to compare it to a critical value that you can find in a table. This table will help: Augmented Dickey-Fuller Table | Real Statistics Using Excel

In an ADF test:

the null hypothesis is that the series is nonstationary
the alternate hypothesis is that the series is stationary

You pick a critical value in these tables based on:

whether your ADF test contains no constant/no trend, a constant only or a constant and a trend (I'm not familiar with the package you are using so I can't tell which one it is just from looking at your code - please provide the correct name of the package since Multiple Bubbles is not a valid R package name)
the level of significance you are going for

If your computed test statistic is less than the critical value, then it means that you reject the null hypothesis. If your computed test statistic is greater than the critical value, then you cannot reject the null hypothesis and you accept that your series is nonstationary.

sunny_oxford · March 17, 2022, 6:29pm

Hi @gueyenono, thank you for your reply. Indeed, i am familiar with the statistical test. My overall problem is that i have a time series which is clearly nonstationary, with quite a high degree of autocorrelation. Hence, I thought I would do multiple ADF tests and then compare them using AIC, so as to establish the order of integration of the data, which is where I am struggling in a very pragmatic sense. I have realized there are multiple packages to do ADF tests in R (aTSA, tseries, urca, MultipleBubbles). Let me summarize them below.

library(tseries)
adf.test(US1ytimeseries[,2], k=5)

with the output being:

Augmented Dickey-Fuller Test

data:  US1ytimeseries[, 2]
Dickey-Fuller = -3.02, Lag order = 5, p-value = 0.1505
alternative hypothesis: stationary

Here, my understanding is that we fail to reject the null hypothesis (at multiple confidence levels) and conclude that the time series is nonstationary. Could we infer anything else from this?
If instead I use the package tseries, I get the following:

library(aTSA)
adf.test(US1ytimeseries[,2], nlag = 5)

with the output being:

  Augmented Dickey-Fuller Test 
  alternative: stationary 

 Type 1: no drift no trend 
 lag    ADF p.value
 [1,]   0 -0.101   0.614
 [2,]   1 -0.226   0.579
 [3,]   2 -0.356   0.541
 [4,]   3 -0.677   0.437
 [5,]   4 -0.907   0.354
 Type 2: with drift no trend 
  lag   ADF p.value
 [1,]   0 -2.24  0.2361
 [2,]   1 -2.77  0.0704
 [3,]   2 -2.91  0.0486
 [4,]   3 -3.63  0.0100
 [5,]   4 -4.06  0.0100
Type 3: with drift and trend 
 lag   ADF p.value
 [1,]   0 -2.14  0.5150
 [2,]   1 -2.67  0.2951
[3,]   2 -2.75  0.2604
[4,]   3 -3.40  0.0557
[5,]   4 -3.67  0.0293
---- 
Note: in fact, p.value = 0.01 means p.value <= 0.01

This is where it gets a little bit trickier: I understand the difference between the three models, but how to intepret it in this case? And what about the different lags? I do not seem to find any guide to the package interpretation online.

Then, given that my old interest is in identifying the order of autocorrelation, I noticed that with the package MultipleBubbles I could do ADF and AIC at once. Hence:

library(MultipleBubbles)
ADF_IC(US1ytimeseries[,2], adflag = 12, mflag = 1, IC = 1)

#Notice I used mflag=1 suspecting there is a constat term in the ADF)

with output:

$`ADF Statistic using AIC`
[1] -2.236572

Now, I am very much not sure how to intepret "[1]" in this case, and I have not found anything useful online.

Finally, I have tried the package urca. Hence,

l> ibrary(urca)

interp_urdf <- function(urdf, level="5pct") {code taken by Hank Roark github }

dmeanUS1ytimeseries <- diff(US1ytimeseries[,2], lag = 1, differences = 1)

#========================================================
ADF test of level variables

#========================================================

Level
adf.t.n = ur.df(US1ytimeseries[,2], type ="none" , selectlags = c("AIC"))
adf.t.d = ur.df(US1ytimeseries[,2], type = "drift", selectlags = c("AIC"))
adf.t.t = ur.df(US1ytimeseries[,2], type="trend", selectlags = c("AIC"))

1st difference
adf.d.n = ur.df(dmeanUS1ytimeseries, type="none" , selectlags = c("AIC"))
adf.d.d = ur.df(dmeanUS1ytimeseries, type="drift", selectlags = c("AIC"))
adf.d.t = ur.df(dmeanUS1ytimeseries, type="trend", selectlags = c("AIC"))

#========================================================
Automatic Interpretation by using Hank Roark procedure
#========================================================

Level

interp_urdf(adf.t.n, "5pct")
interp_urdf(adf.t.d, "5pct")
interp_urdf(adf.t.t, "5pct")

1st difference

interp_urdf(adf.d.n, "5pct")
interp_urdf(adf.d.d, "5pct")
interp_urdf(adf.d.t, "5pct")

with output:

   ===================================================

   At the 5pct level:

   The model is of type none

    tau1: The null hypothesis is not rejected, unit root is present

     ====================================================
    interp_urdf(adf.t.d, "5pct")
    ======================================================
    At the 5pct level:
   The model is of type drift
   tau2: The first null hypothesis is not rejected, unit root is present
   phi1: The second null hypothesis is not rejected, unit root is present
  and there is no drift.
 =======================================================
 interp_urdf(adf.t.t, "5pct")
 =======================================================
 At the 5pct level:
The model is of type trend
tau3: The first null hypothesis is not rejected, unit root is present
phi3: The second null hypothesis is not rejected, unit root is present
and there is no trend
phi2: The third null hypothesis is not rejected, unit root is present
there is no trend, and there is no drift
========================================================================
adf.d.n = ur.df(dmeanUS1ytimeseries, type="none" , selectlags = c("AIC"))
adf.d.d = ur.df(dmeanUS1ytimeseries, type="drift", selectlags = c("AIC"))
adf.d.t = ur.df(dmeanUS1ytimeseries, type="trend", selectlags = c("AIC"))
interp_urdf(adf.d.n, "5pct")
========================================================================
At the 5pct level:
The model is of type none
tau1: The null hypothesis is rejected, unit root is not present
========================================================================  
interp_urdf(adf.d.d, "5pct")
========================================================================
At the 5pct level:
The model is of type drift
tau2: The first null hypothesis is rejected, unit root is not present
phi1: The second null hypothesis is rejected, unit root is not present
and there is drift.
========================================================================
 interp_urdf(adf.d.t, "5pct")
=======================================================================
 At the 5pct level:
The model is of type trend
tau3: The first null hypothesis is rejected, unit root is not present
 phi3: The second null hypothesis is rejected, unit root is not present
and there may or may not be trend
phi2: The third null hypothesis is rejected, unit root is not present 
there may or may not be trend, and there may or may not be drift
========================================================================
Warning messages:
1: In interp_urdf(adf.d.t, "5pct") : Presence of trend is inconclusive.
2: In interp_urdf(adf.d.t, "5pct") :
Presence of trend and drift is inconclusive.

Hence my understanding is that given that the first-differenced time series is stationary the overall time series is AR(1). Now my question is, how can I prove this formally?

gueyenono · March 17, 2022, 6:36pm

It is now clear to me that you understand what you are doing.

Could you upload your data somewhere and make it available (if you can legally do so)? This will enable me to help you better.

Also the [1] in your output does not have anything to do with the test result. This is R's way to tell you that this number is the first value of a vector (a sequence of values). If you run 1:200 in your console, you will see more numbers in brackets at each new line specifying their position.

Send me the data and I'll send you a response to your concern.

sunny_oxford · March 17, 2022, 7:01pm

Dear @gueyenono, thank you for your help. Unfortunately, I am not allowed to share this dataset , but it would be for me if you simulated some and then I would take the differences into account.
Many thanks,
Diego

gueyenono · March 18, 2022, 8:53pm

I promise to get to your question as soon as I get to my destination since I am on a trip.

gueyenono · March 24, 2022, 2:05pm

@sunny_oxford I really wanted to get back to you earlier, but was quite busy. It might be a bit too late for this to help, but since I gave my word, here it is.

In order to determine the order of integration of a series, you want to:

Determine whether the series is stationary
If it isn't then, you want to difference it once and test whether the differenced series is stationary
If it is, then you may conclude that the series is integrated of order 1 (or I(1)). In case the differenced series is still nonstationary, you may difference it a second time and apply the ADF test to it.

I follow this process with the code below, which you can copy and paste

# Simulate a nonstationary series (a random walk) ----

set.seed(123)
n <- 500
e <- rnorm(n)
y <- rep(x = 0, times = n)

for(t in 2:length(y)){
  y[t] <- y[t-1] + e[t]
}

plot(y, type = "l")


# Test whether the series contains a unit root (is the series nonstationary?) ----

library(urca)

test_results <- ur.df(y = y, type = "none", selectlags = "AIC")
test_results


# Extract the test statistic and the p-value ----

stat <- attributes(test_results)$teststat |> as.vector() # -0.3692569
pval <- coef(attributes(test_results)$testreg)[1, 4] # 0.7120939

# The p-value is too high, so there is not enough evidence to reject the null of unit root (i.e. the series is nonstationary)


# Now, let's difference the series...

dy <- diff(y)

plot(dy, type = "l")


# ... and test for unit root

test_results2 <- ur.df(y = dy, type = "none", selectlags = "AIC")
test_results2

stat <- attributes(test_results2)$teststat |> as.vector() # -17.09826
pval <- coef(attributes(test_results2)$testreg)[1, 4] # 0.000

# This time, we have a very low p-value. We can therefore say that we have enough evidence to reject the null hypoethesis (i.e. the series is stationary)

# CONCLUSION: This series is integrated of order 1 (it is an I(1) series)

system · April 14, 2022, 2:06pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.