Saving and Reconstituting large collection of arima models

fniski · January 25, 2024, 2:39pm

Hello everyone,
I wonder if anyone have suggestions to do what I want to do.

I am using fable and ARIMA functions to fit several different Linear Models with ARIMA errors to my financial time series. Basically, I have 40 candidate factors, then I form all possible combinations of groups of 3 factors and I fit all 9K+ models using ARIMA function + parallelization. I collect all these models into a mable and it is good and fine.

However, I noticed that when I save that mable into an RDS, not only the size of it is big on the disk (few gigabites), but the saving and loading of the RDS takes time. I wanted a solution that gets rids of these issues.

My approach then was to instead of saving the mable, I build a table with the formulas (as strings) for each model that was fitted for each combination of 3 factors. Formulas are like: "Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"

This is much less information to save and to load. Then I re-constitute the ARIMA models by passing explicitly the orders p,d,q and P,D,Q so the algorithm does not need to search for it.
I basically re-constitute each model using something like this:

df_model_new <- myModelData_long |> model(!!mydisplayTable[i,]$ModelName := ARIMA(as.formula(mydisplayTable[i,]$Formulas[[1]])))

I put this into a foreach loop with a %dopar% and then cbind all the reconstituted models, so I essentially skip the saving and loading of the gigantic mable.

This speeds up a lot and saves me disk space, but I wanted to speed up even further. I wanted to not only pass the pre-cooked pdq and PDQ parameters, but why not also the coefficients, this way when I run my reconstituteModels code , the ARIMA algorithm basically dont need to calculate nothing, but just create the structures that I need.

I looked into fable's ARIMA documentation and it seems that I can do that if instead of building my formula strings as

"Volume ~ Inflation + PrimeRate + HousePriceNY + pdq(2,1,0) + PDQ(1,0,0)"

I build it as something like this:
"Volume ~ xreg(Inflation , PrimeRate , HousePriceNY, fixed=list(Inflation =0.03 PrimeRate =0.3546 HousePriceNY =0.53254) , pdq(2,1,0, fixed=list(ar1=0.0324, ...) + PDQ(1,0,0, fixed=list(sar1=...)"

I think I can build strings like that by fishing the parameters in the $fit$par structure. Although nothing out of this world in terms of complexity, build all those strings could be a rather boring task.

At this point I decided to stop and ask the community if there are smarter ways to do what I want.

Thanks!

nirgrahamuk · January 25, 2024, 3:33pm

In general save and load arbitrary data can be a lot faster using qs package.
Also I would try https://butcher.tidymodels.org/ butcher package to see if it can be used to reduce your object sizes, I'm unsure how likely that would be.

system · February 15, 2024, 3:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.