Hello everyone,
I wonder if anyone have suggestions to do what I want to do.
I am using fable and ARIMA functions to fit several different Linear Models with ARIMA errors to my financial time series. Basically, I have 40 candidate factors, then I form all possible combinations of groups of 3 factors and I fit all 9K+ models using ARIMA function + parallelization. I collect all these models into a mable and it is good and fine.
However, I noticed that when I save that mable into an RDS, not only the size of it is big on the disk (few gigabites), but the saving and loading of the RDS takes time. I wanted a solution that gets rids of these issues.
My approach then was to instead of saving the mable, I build a table with the formulas (as strings) for each model that was fitted for each combination of 3 factors. Formulas are like: "Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"
This is much less information to save and to load. Then I re-constitute the ARIMA models by passing explicitly the orders p,d,q and P,D,Q so the algorithm does not need to search for it.
I basically re-constitute each model using something like this:
df_model_new <- myModelData_long |> model(!!mydisplayTable[i,]$ModelName := ARIMA(as.formula(mydisplayTable[i,]$Formulas[[1]])))
I put this into a foreach loop with a %dopar% and then cbind all the reconstituted models, so I essentially skip the saving and loading of the gigantic mable.
This speeds up a lot and saves me disk space, but I wanted to speed up even further. I wanted to not only pass the pre-cooked pdq and PDQ parameters, but why not also the coefficients, this way when I run my reconstituteModels code , the ARIMA algorithm basically dont need to calculate nothing, but just create the structures that I need.
I looked into fable's ARIMA documentation and it seems that I can do that if instead of building my formula strings as
"Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"
I build it as something like this:
"Volume ~ xreg(Inflation , PrimeRate , HousePriceNY, fixed=list(Inflation =0.03 PrimeRate =0.3546 HousePriceNY =0.53254) , pdq(2,1,0, fixed=list(ar1=0.0324, ...) + PDQ(1,0,0, fixed=list(sar1=...)"
I think I can build strings like that by fishing the parameters in the $fit$par structure. Although nothing out of this world in terms of complexity, build all those strings could be a rather boring task.
At this point I decided to stop and ask the community if there are smarter ways to do what I want.
Thanks!