How would the "best" model for each series, against some criteria, be extracted from an accuracy tibble?
How can that selection of models be used to create a new Mable against the full dataset?
The two questions I think I'm asking are:
As someone new to R, I'm looking for some guidance on the R syntax for, I assume using a group_by statement (I've written it in python as a test - find the index of the minimum of a criteria in each relevant group in the tibble...but I can't figure out the R equivalent)
Secondly - how to pass that selection of models back to the model statement to use against the full dataset?
And thanks for such a fantastic set of forecasting tools too.
The mable package may or may not be the best launching pad for model evaluation and comparison. Take a look at Forecasting Principles and Practices by authors of the forecast package, especially chapter 3, before getting engaged to a specific tool.
thank-you. I complicated my question by raising an assumed solution .
Let me rephrase:
For a mable, created on a training set, with say 5 models and 50 series, is there a Fable function that extracts the "best" model (on a given criteria for forecast accuracy) for each series to a new mable.
This is to enable the selected models to be refit against the full data set.
Perhaps I've missed it in the documentation, but I've tried to look closely.
(So I jury rigged a solution involving python and R and too much manual intervention - but was hoping for a single method to keep me in the R space for a better workflow)
Here's my interpretation of your question (please correct me if this isn't right):
You're estimating multiple models. Let's say ETS() and ARIMA() as an example, but any model can be used here.
You evaluate model performance using accuracy().
Looking at the accuracy output, you see that sometimes ETS() is better, other times ARIMA() is preferred. Because of this, you want to create a new mable that uses whichever model is best.
One approach would be to use if_else() to select between models in a mable based on some condition. The example code below demonstrates how this could be done.
library(fpp3)
fit <- tourism %>%
group_by(State) %>%
summarise(Trips = sum(Trips)) %>%
model(
ets = ETS(Trips),
arima = ARIMA(Trips)
)
fit_accuracy <- accuracy(fit, measures = lst(MASE)) %>%
pivot_wider(names_from = .model, values_from = MASE) %>%
select(-.type)
fit_accuracy
#> # A tibble: 8 x 3
#> State ets arima
#> <chr> <dbl> <dbl>
#> 1 ACT 0.709 0.725
#> 2 New South Wales 0.731 0.719
#> 3 Northern Territory 0.769 0.776
#> 4 Queensland 0.714 0.797
#> 5 South Australia 0.714 0.723
#> 6 Tasmania 0.746 0.778
#> 7 Victoria 0.712 0.728
#> 8 Western Australia 0.619 0.650
best_fit <- fit %>%
transmute(
State, # Need to keep key variables for a valid mable
best_fit = if_else(fit_accuracy$ets < fit_accuracy$arima, ets, arima)
)
best_fit
#> # A mable: 8 x 2
#> # Key: State [8]
#> State best_fit
#> <chr> <model>
#> 1 ACT <ETS(M,A,N)>
#> 2 New South Wales <ARIMA(0,1,1)(0,1,1)[4]>
#> 3 Northern Territory <ETS(M,N,M)>
#> 4 Queensland <ETS(A,N,A)>
#> 5 South Australia <ETS(M,N,A)>
#> 6 Tasmania <ETS(M,N,M)>
#> 7 Victoria <ETS(M,N,M)>
#> 8 Western Australia <ETS(M,N,M)>