What methods/approaches can speed up time series modelling?

DaveRGP · March 12, 2019, 4:48pm

I've been doing a project modelling and forecasting time series data. I've found a great new set of tools in tsibble/fable/feast, which make it far easier to write code that allows multiple models to be generated and compared.

What I am finding though is that the model processing time is now an even more significant bottleneck. I've scaled up my code, but not my actual processing power.

A few methods seem immediately useful, but at the cost of model detail. Things like aggregating on the time index (processing daily data is cheaper than half hourly), and shorter training sets (modelling on 1 year of data is cheaper than two), but these are compromises that may invalidate the goals of the model (I don't want a daily model, etc.)

I've moved over to a cloud instance, and am running my code on a bigger machine, but I'm not actually seeing the performance improvements I had hoped. I had assumed that more cpu and ram would be a simple (if maybe not cost effective/optimised) solution, but I'm not certain it's even being utilised by what I'm running.

At the moment I am working with half hourly data for about a year, with a few extra variables outside my prediction target. My code looks something like:

my_data %>%
  model(
    ARIMA_xmpl = fable::ARIMA(
      my_value ~ pdq(1, 0, 1) + PDQ(1, 1, 1, period = "day") + my_variable
    ),
   ARIMA_xmpl_2 = fable::ARIMA(
      my_value ~ pdq(1, 0, 1) + PDQ(1, 1, 1, period = "day") + my_variable + my_other_variable
    )
  )

What is available to me to:

check if my process is memory bound or cpu bound?
ensure I'm utilising all my cpus?

andresrcs · March 12, 2019, 5:07pm

I'm not a modeling expert but this are some general pointers about performance.
First, try to identify where are the bottlenecks by profiling your code (take a look into profvis package https://www.rstudio.com/resources/webinars/profvis-profiling-tools-for-faster-r-code/)
Then you may want to look into parallel programming in R with packages like furrr, parallel, foreach, doParallel, etc.

earowang · March 15, 2019, 11:50pm

The model() doesn't take advantage of the extra processing power, bc it fits a model sequentially at the moment. We're more focusing on good fluent API for code expressiveness currently. Parallel computing will happen later.

DaveRGP · March 18, 2019, 10:01am

You're doing a great job. Fluent API for code expressiveness is definitely a success.

To clarify, does this mean that if I have multiple models that it is not currently possible to map them using tools like furrr, etc. in parallel via the model() function, or that a single model alone cannot be parallel?

earowang · March 18, 2019, 11:31am

Each model you specified above fits each series independently and sequentially. Loading furrr or future doesn't help with parallel computing if model()ing the whole tsibble.

However, you could split my_data (a tsibble) to small subsets, and model() each subset by using furrr::future_map().

system · March 25, 2019, 11:31am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.