I have started using fable and I am wondering whether there is a function in fable to calculate the accuracy of prediction intervals for any given forecasting model? or we need to extract them and calculate it using a user defined function? sharing a reproducible example would be very helpful.
There are a few accuracy measures available in fabletools which allow you to evaluate the accuracy of intervals and distributions.
For intervals, the winkler()
score is available.
For distributions, percentile_score()
and CRPS()
are available.
Explanations of how winkler()
and percentile_score()
are computed is available here: https://robjhyndman.com/papers/forecasting_state_of_the_art.pdf
There should be plenty of resources online to learn about continuous ranked probability scores (CRPS()
).
Commonly used (and implemented) accuracy measures are organised into lists named interval_accuracy_measures
and distribution_accuracy_measures
, and I have used these below. However it is also possible to create your own list of accuracy measures to use.
library(tsibble)
library(fable)
library(dplyr)
us_deaths <- as_tsibble(USAccDeaths)
us_deaths %>%
# Withold a test set of one year
filter(index < yearmonth("1978 Jan")) %>%
# Model the training data
model(ETS(value)) %>%
# Forecast the test set
forecast(h = "1 year") %>%
# Compute interval/distribution accuracy
accuracy(us_deaths, measures = c(interval_accuracy_measures, distribution_accuracy_measures))
#> # A tibble: 1 x 5
#> .model .type winkler percentile CRPS
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 ETS(value) Test 2036. 91.6 181.
Created on 2020-01-28 by the reprex package (v0.3.0)
Is this still valid for time series cross validation? If we fit a model to various rolling windows e.g.using stretch_tsibble, then can we still get the winkler
, percentile_score
and CRPS
? If the answer is yes, how it is summarized across multiple rolling windows?
I don't know of any issues using these measures with cross validation.
You can summarise it in many ways, as the measures are averages you may consider taking the mean. The median is also reasonable, and often I look at and compare densities of accuracy measures.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.