I have started using fable and I am wondering whether there is a function in fable to calculate the accuracy of prediction intervals for any given forecasting model? or we need to extract them and calculate it using a user defined function? sharing a reproducible example would be very helpful.

# Accuracy of prediction intervals in fable

There are a few accuracy measures available in fabletools which allow you to evaluate the accuracy of intervals and distributions.

For intervals, the `winkler()`

score is available.

For distributions, `percentile_score()`

and `CRPS()`

are available.

Explanations of how `winkler()`

and `percentile_score()`

are computed is available here: https://robjhyndman.com/papers/forecasting_state_of_the_art.pdf

There should be plenty of resources online to learn about continuous ranked probability scores (`CRPS()`

).

Commonly used (and implemented) accuracy measures are organised into lists named `interval_accuracy_measures`

and `distribution_accuracy_measures`

, and I have used these below. However it is also possible to create your own list of accuracy measures to use.

```
library(tsibble)
library(fable)
library(dplyr)
us_deaths <- as_tsibble(USAccDeaths)
us_deaths %>%
# Withold a test set of one year
filter(index < yearmonth("1978 Jan")) %>%
# Model the training data
model(ETS(value)) %>%
# Forecast the test set
forecast(h = "1 year") %>%
# Compute interval/distribution accuracy
accuracy(us_deaths, measures = c(interval_accuracy_measures, distribution_accuracy_measures))
#> # A tibble: 1 x 5
#> .model .type winkler percentile CRPS
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 ETS(value) Test 2036. 91.6 181.
```

^{Created on 2020-01-28 by the reprex package (v0.3.0)}

Is this still valid for time series cross validation? If we fit a model to various rolling windows e.g.using stretch_tsibble, then can we still get the `winkler`

, `percentile_score`

and `CRPS`

? If the answer is yes, how it is summarized across multiple rolling windows?

I don't know of any issues using these measures with cross validation.

You can summarise it in many ways, as the measures are averages you may consider taking the mean. The median is also reasonable, and often I look at and compare densities of accuracy measures.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.