Hello,
I have been studying time series analysis (I still am) and I have a basic doubt: we start we our time data and need to apply transformations to it to make the data suitable for creating a model. The new transformed data becomes stationary, etc. we then build the model on the transformed data. The transformations seem to move us more and more away from the original data... Isn't the model supposed to describe the original data? How can the model do that if it is built on transformed data, which derives from the original data, but is very different from it? The original data may have had trend, seasonality, etc. and we remove those to build the model...But the goal is to build a model that describes and can make forecasts on the data that looks like the original data...
You're describing a modelling process that is often used for ARIMA models. Other models don't necessarily follow the same process.
For an ARIMA model, the various components are usually systematically identified, and then removed, in order to identify the next component. But all components are part of the model. When it comes to forecasting, all are applied. For example, suppose you use a log transformation, and a difference transformation, then identify a suitable ARMA model for what's left. The full model includes all of these, and the forecasts are obtained by applying them in reverse. The forecasts from the ARMA model are calculated, then the differencing is reversed. then the results are exponentiated, to obtain forecasts on the original scale. (I'm ignoring bias adjustment for simplicity.) So the final forecasts are of the original data.
By analogy, sometimes we transform a function f(t) in the time domain into its frequency version F(\omega) which lives in the frequency domain. We then solve the problem in the frequency domain because the problem is easier mathematically, get a solution, and the convert that frequency domain solution into the time domain solution...
Thanks
Additionally, if I I might, the ARIMA models are models in which the response variable Y_t is modelled as regression with the regressors being the current and past errors or current and lagged versions of Y_{t-n}....
When I think of a time-series, deterministic or stochastic, I think of a function of time Y_t=f(t)..Is it possible to convert a AR, ARMA, MA, etc. model to express Y as a deterministic function of time plus additive noise: \epsilon, i.e. Y_t=f(t)+\epsilon(t)