Accuracy of prediction intervals in fable

I have started using fable and I am wondering whether there is a function in fable to calculate the accuracy of prediction intervals for any given forecasting model? or we need to extract them and calculate it using a user defined function? sharing a reproducible example would be very helpful.

There are a few accuracy measures available in fabletools which allow you to evaluate the accuracy of intervals and distributions.

For intervals, the winkler() score is available.
For distributions, percentile_score() and CRPS() are available.

Explanations of how winkler() and percentile_score() are computed is available here: https://robjhyndman.com/papers/forecasting_state_of_the_art.pdf
There should be plenty of resources online to learn about continuous ranked probability scores (CRPS()).

Commonly used (and implemented) accuracy measures are organised into lists named interval_accuracy_measures and distribution_accuracy_measures, and I have used these below. However it is also possible to create your own list of accuracy measures to use.

library(tsibble)
library(fable)
library(dplyr)
us_deaths <- as_tsibble(USAccDeaths)
us_deaths %>% 
  # Withold a test set of one year
  filter(index < yearmonth("1978 Jan")) %>% 
  # Model the training data
  model(ETS(value)) %>% 
  # Forecast the test set
  forecast(h = "1 year") %>% 
  # Compute interval/distribution accuracy
  accuracy(us_deaths, measures = c(interval_accuracy_measures, distribution_accuracy_measures))
#> # A tibble: 1 x 5
#>   .model     .type winkler percentile  CRPS
#>   <chr>      <chr>   <dbl>      <dbl> <dbl>
#> 1 ETS(value) Test    2036.       91.6  181.

Created on 2020-01-28 by the reprex package (v0.3.0)

2 Likes

Is this still valid for time series cross validation? If we fit a model to various rolling windows e.g.using stretch_tsibble, then can we still get the winkler , percentile_score and CRPS? If the answer is yes, how it is summarized across multiple rolling windows?

I don't know of any issues using these measures with cross validation.
You can summarise it in many ways, as the measures are averages you may consider taking the mean. The median is also reasonable, and often I look at and compare densities of accuracy measures.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.