Saving and Reconstituting large collection of arima models

Hello everyone,
I wonder if anyone have suggestions to do what I want to do.

I am using fable and ARIMA functions to fit several different Linear Models with ARIMA errors to my financial time series. Basically, I have 40 candidate factors, then I form all possible combinations of groups of 3 factors and I fit all 9K+ models using ARIMA function + parallelization. I collect all these models into a mable and it is good and fine.

However, I noticed that when I save that mable into an RDS, not only the size of it is big on the disk (few gigabites), but the saving and loading of the RDS takes time. I wanted a solution that gets rids of these issues.

My approach then was to instead of saving the mable, I build a table with the formulas (as strings) for each model that was fitted for each combination of 3 factors. Formulas are like: "Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"

This is much less information to save and to load. Then I re-constitute the ARIMA models by passing explicitly the orders p,d,q and P,D,Q so the algorithm does not need to search for it.
I basically re-constitute each model using something like this:

df_model_new <- myModelData_long |> model(!!mydisplayTable[i,]$ModelName := ARIMA(as.formula(mydisplayTable[i,]$Formulas[[1]])))

I put this into a foreach loop with a %dopar% and then cbind all the reconstituted models, so I essentially skip the saving and loading of the gigantic mable.

This speeds up a lot and saves me disk space, but I wanted to speed up even further. I wanted to not only pass the pre-cooked pdq and PDQ parameters, but why not also the coefficients, this way when I run my reconstituteModels code , the ARIMA algorithm basically dont need to calculate nothing, but just create the structures that I need.

I looked into fable's ARIMA documentation and it seems that I can do that if instead of building my formula strings as

"Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"

I build it as something like this:
"Volume ~ xreg(Inflation , PrimeRate , HousePriceNY, fixed=list(Inflation =0.03 PrimeRate =0.3546 HousePriceNY =0.53254) , pdq(2,1,0, fixed=list(ar1=0.0324, ...) + PDQ(1,0,0, fixed=list(sar1=...)"

I think I can build strings like that by fishing the parameters in the $fit$par structure. Although nothing out of this world in terms of complexity, build all those strings could be a rather boring task.

At this point I decided to stop and ask the community if there are smarter ways to do what I want.


In general save and load arbitrary data can be a lot faster using qs package.
Also I would try butcher package to see if it can be used to reduce your object sizes, I'm unsure how likely that would be.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.