Accuracy figures from bagged models

alstev · September 2, 2021, 12:18pm

I've primarily been using modeltime but having a play around with fable to explore bagging. I'm able to recreate the examples from the fpp3 book but what I would like to do is calculate accuracy metrics for the mean or median of the bagged forecast so I can compare with the standard model e.g. ETS vs baggedETS and so on.

I see the examples to create models, forecast, and calculate accuracy for single series and (for example) I can take my monthly data, create a training set, fit a number of models and then plot the forecast against the full data along with the accuracy.

It's not clear to me how to do the same thing with baggedETS. I can create train data, generate 100 simulations, and then forecast that forwards. I do not see how to calculate accuracy from there though (I guess because the resulting fable also has an additional .rep column compared to a 'normal' forecast) and I can't find any detail about this on the web or in the book.

Can anyone point me towards a tutorial?

Thanks

robjhyndman · September 2, 2021, 11:48pm

Most of the required code is in the book. The only thing left to do is to turn the bagged forecast into a fable object. If you're only interested in point forecasts, we can just create a degenerate distribution based on the mean. To do full distributional forecasting, we would need to create a mixture distribution. Either way, once we have a fable, the accuracy function can be applied. Here is an example using a degenerate distribution based on the mean.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble      3.1.4      ✓ tsibble     1.0.1 
#> ✓ dplyr       1.0.7      ✓ tsibbledata 0.3.0 
#> ✓ tidyr       1.1.3      ✓ feasts      0.2.2 
#> ✓ lubridate   1.7.10     ✓ fable       0.3.1 
#> ✓ ggplot2     3.3.5
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()
library(distributional)

# Training data
cement <- aus_production %>%
  filter(year(Quarter) >= 1988, year(Quarter) <= 2008) %>%
  select(Quarter, Cement)
# ETS forecasts
ets_forecasts <- cement %>%
  model(ets = ETS(Cement)) %>%
  forecast(h = 6) %>%
  select(-.model)
# Bagged forecasts (averaged over 100 replicates)
bagged_forecasts <- cement %>%
  model(stl = STL(Cement)) %>%
  generate(new_data = cement, times = 100, bootstrap_block_size = 8) %>%
  select(-.model, -Cement) %>%
  model(ets = ETS(.sim)) %>%
  forecast(h = 6) %>%
  summarise(bagged_mean = mean(.mean)) %>%
  mutate(dist = dist_degenerate(bagged_mean)) %>%
  as_fable(response = "Cement", distribution = dist)
#> Warning: The dimnames of the fable's distribution are missing and have been set
#> to match the response variables.
# Compare accuracy
bind_rows(
    ets_forecasts %>% accuracy(aus_production) %>% mutate(method = "ETS"),
    bagged_forecasts %>% accuracy(aus_production) %>% mutate(method = "Bagged ETS")
  ) %>%
  select(method, everything())
#> # A tibble: 2 × 10
#>   method     .type    ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
#>   <chr>      <chr> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
#> 1 ETS        Test  -157.  178.  161.  -7.55  7.71  1.63  1.38 -0.327
#> 2 Bagged ETS Test  -222.  242.  222. -10.7  10.7   2.25  1.87 -0.403

^{Created on 2021-09-03 by the reprex package (v2.0.1)}

alstev · September 3, 2021, 8:37am

Thanks Rob, really appreciate the example

system · September 10, 2021, 8:37am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.