Extracting best models from a Mable

chris_short · February 24, 2020, 10:06pm

How would the "best" model for each series, against some criteria, be extracted from an accuracy tibble?

How can that selection of models be used to create a new Mable against the full dataset?

The two questions I think I'm asking are:

As someone new to R, I'm looking for some guidance on the R syntax for, I assume using a group_by statement (I've written it in python as a test - find the index of the minimum of a criteria in each relevant group in the tibble...but I can't figure out the R equivalent)

Secondly - how to pass that selection of models back to the model statement to use against the full dataset?

And thanks for such a fantastic set of forecasting tools too.

^{Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos}

technocrat · February 25, 2020, 6:14am

Hi, and welcome!

For syntax questions, please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? because they are kind of hard to answer in the abstract.

The mable package may or may not be the best launching pad for model evaluation and comparison. Take a look at Forecasting Principles and Practices by authors of the forecast package, especially chapter 3, before getting engaged to a specific tool.

chris_short · February 25, 2020, 11:26am

Hi @technocrat,

thank-you. I complicated my question by raising an assumed solution .

Let me rephrase:

For a mable, created on a training set, with say 5 models and 50 series, is there a Fable function that extracts the "best" model (on a given criteria for forecast accuracy) for each series to a new mable.

This is to enable the selected models to be refit against the full data set.

Perhaps I've missed it in the documentation, but I've tried to look closely.

(So I jury rigged a solution involving python and R and too much manual intervention - but was hoping for a single method to keep me in the R space for a better workflow)

mitchelloharawild · February 25, 2020, 2:11pm

Here's my interpretation of your question (please correct me if this isn't right):

You're estimating multiple models. Let's say ETS() and ARIMA() as an example, but any model can be used here.
You evaluate model performance using accuracy().
Looking at the accuracy output, you see that sometimes ETS() is better, other times ARIMA() is preferred. Because of this, you want to create a new mable that uses whichever model is best.

One approach would be to use if_else() to select between models in a mable based on some condition. The example code below demonstrates how this could be done.

library(fpp3)
fit <- tourism %>% 
  group_by(State) %>% 
  summarise(Trips = sum(Trips)) %>% 
  model(
    ets = ETS(Trips), 
    arima = ARIMA(Trips)
  )

fit_accuracy <- accuracy(fit, measures = lst(MASE)) %>% 
  pivot_wider(names_from = .model, values_from = MASE) %>% 
  select(-.type)
fit_accuracy
#> # A tibble: 8 x 3
#>   State                ets arima
#>   <chr>              <dbl> <dbl>
#> 1 ACT                0.709 0.725
#> 2 New South Wales    0.731 0.719
#> 3 Northern Territory 0.769 0.776
#> 4 Queensland         0.714 0.797
#> 5 South Australia    0.714 0.723
#> 6 Tasmania           0.746 0.778
#> 7 Victoria           0.712 0.728
#> 8 Western Australia  0.619 0.650

best_fit <- fit %>% 
  transmute(
    State, # Need to keep key variables for a valid mable
    best_fit = if_else(fit_accuracy$ets < fit_accuracy$arima, ets, arima)
  )
best_fit
#> # A mable: 8 x 2
#> # Key:     State [8]
#>   State              best_fit                
#>   <chr>              <model>                 
#> 1 ACT                <ETS(M,A,N)>            
#> 2 New South Wales    <ARIMA(0,1,1)(0,1,1)[4]>
#> 3 Northern Territory <ETS(M,N,M)>            
#> 4 Queensland         <ETS(A,N,A)>            
#> 5 South Australia    <ETS(M,N,A)>            
#> 6 Tasmania           <ETS(M,N,M)>            
#> 7 Victoria           <ETS(M,N,M)>            
#> 8 Western Australia  <ETS(M,N,M)>

^{Created on 2020-02-25 by the reprex package (v0.3.0)}

chris_short · February 25, 2020, 7:17pm

Thanks Mitchell - that was both a clearer description of the task, and an example solution I was after

I have a few more keys and possibly other measures but that gets me moving again.

Much appreciated.

Cheers

system · March 3, 2020, 7:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.