I have a problem where I am using auto.arima
with multiple exogenous regressors (thus xreg
being a matrix). My dependent variable as well as all of the exogenous regressors have NA values at some time points. Thus, my y:X
matrix (the dependent variable in the first column and all the regressors in the other columns) have some NA rows. If I try running auto.arima
on this kind of data, it breaks down, seemingly thinking it is facing multicollinearity among regressors or something. Namely, the code of auto.arima
includes this bit:
if (min(sv)/sum(sv) < .Machine$double.eps) { stop("xreg is rank deficient") }
Thus, auto.arima
throws an error message if xreg
contains several rows of NAs (or maybe even a single row of NAs? I have not checked).
Intuitively, I do not see why some NA rows should be a big problem for a Kalman filter (?) running in the background of auto.arima
. Indeed, if I disable this if
statement in the code, I am able to run my model with (the modified) auto.arima
successfully. At least it gives some results; I cannot guarantee they are always meaningful.
So is it really a problem if the xreg
matrix suffers from row-wise rank deficiency rather than column-wise rank deficiency? And if this row-wise rank deficiency is actually harmless, should the code of auto.arima
be updated accordingly to allow it?
Edit: While this is the case for my data, I cannot replicate the behavior using simulated data. So probably the problem is not just the presence of several rows of NAs. But still, disabling the if
statement gives OK results for my data. I am puzzled...
Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos