Prediction interval for a simple average forecast combination

In section 13.4 Forecast combinations of the FPP3 textbook, an example demonstrates how to take the simple average of four models. (I have modified the code below to predict only 2 years worth of values so it can run faster.)

From what I can tell, cafe_fc does not automatically contain prediction interval values for the simple average method (it only contains the point predictions).

On the final step, is there an easy way to plot the prediction intervals only for the 'combination' (simple average) method?

It seems doable by manipulating cafe_fc %>% hilo() a bit, but I'm not really familiar with how to manipulate these kinds of objects (they're in a format like [1458.816, 1611.358]95).

Perhaps more importantly, would such a prediction interval be relatively accurate?

auscafe <- aus_retail %>%
  filter(stringr::str_detect(Industry,"Takeaway")) %>%
  summarise(Turnover = sum(Turnover))
train <- auscafe %>%
  filter(year(Month) <= 2016)
STLF <- decomposition_model(
  STL(log(Turnover) ~ season(window = Inf)),
  ETS(season_adjust ~ season("N"))
cafe_models <- train %>%
    ets = ETS(Turnover),
    stlf = STLF,
    arima = ARIMA(log(Turnover)),
    nnar = NNETAR(log(Turnover))
  ) %>%
  mutate(combination = (ets + stlf + arima + nnar)/4)
cafe_fc <- cafe_models %>%
  forecast(h = "2 years")

cafe_fc %>%
  autoplot(auscafe %>% filter(year(Month) > 2008), level=NULL) +
  xlab("Year") + ylab("$ billion") +
  ggtitle("Australian monthly expenditure on eating out")

The cafe_fc object contains forecast distributions, from which any prediction interval can normally be computed. Let's look at the intervals for the first period.

# Forecast distributions for h=1
cafe_fc %>% filter(Month == min(Month))
#> # A fable: 5 x 4 [1M]
#> # Key:     .model [5]
#>   .model         Month           Turnover .mean
#>   <chr>          <mth>             <dist> <dbl>
#> 1 ets         2017 Jan      N(1535, 1514) 1535.
#> 2 stlf        2017 Jan t(N(7.4, 0.00062)) 1616.
#> 3 arima       2017 Jan t(N(7.3, 0.00057)) 1550.
#> 4 nnar        2017 Jan    t(sample[1000]) 1546.
#> 5 combination 2017 Jan           1561.661 1562.

The first four are a mixture of normal, transformed normal and empirical distributions. The package does not yet combine such diverse distributions, so the combination output is simply the mean instead.

However, if we work with simulated sample paths, it is possible to do what you want.

# Generate future sample paths
cafe_futures <- cafe_models %>%
  generate(h = "2 years", times=500)

# Compute forecast distributions from future sample paths and create fable object
cafe_fc2 <- cafe_futures %>%
  as_tibble() %>%
  group_by(Month, .model) %>%
  summarise(dist = distributional::dist_sample(list(.sim))) %>%
  ungroup() %>%
  as_fable(index=Month, key=.model, distribution=dist, response="Turnover")
#> `summarise()` regrouping output by 'Month' (override with `.groups` argument)
#> Warning: The dimnames of the fable's distribution are missing and have been set
#> to match the response variables.

cafe_fc2 %>% filter(Month == min(Month))
#> # A fable: 5 x 3 [1M]
#> # Key:     .model [5]
#>      Month .model             dist
#>      <mth> <chr>            <dist>
#> 1 2017 Jan arima       sample[500]
#> 2 2017 Jan combination sample[500]
#> 3 2017 Jan ets         sample[500]
#> 4 2017 Jan nnar        sample[500]
#> 5 2017 Jan stlf        sample[500]

Now all four models and the combination are stored as empirical distributions.

We can plot the forecasts for just one of them using filter().

# Plot forecasts with prediction intervals from combination
cafe_fc2 %>%
  filter(.model == "combination") %>%
  autoplot(auscafe %>% filter(year(Month) > 2011)) +
  xlab("Year") + ylab("$ billion") +
  ggtitle("Australian monthly expenditure on eating out")

To check the accuracy of the 95% prediction intervals, we can use a Winkler score (defined at

# Check accuracy of 95% prediction intervals
cafe_fc2 %>% accuracy(auscafe, measures = interval_accuracy_measures, level=95)
#> # A tibble: 5 x 3
#>   .model      .type winkler
#>   <chr>       <chr>   <dbl>
#> 1 arima       Test     397.
#> 2 combination Test     408.
#> 3 ets         Test     344.
#> 4 nnar        Test     362.
#> 5 stlf        Test     504.

Lower is better, so the combination forecast is not particularly good here.

Thanks Rob. This section seems to be one of the final steps in developing a forecasting model for a 'typical' dataset (one that doesn't fall into one of the special categories like very short time series, or low integer counts). It makes sense (and is fundamentally sound) to quote/display the prediction intervals when using these forecast combinations, right?

[Edit: I just noticed you updated the FPP3 web page too, so I guess the prediction intervals created in this way are generally appropriate to include.]

Yes, these prediction intervals should be appropriate. Thanks for prompting me to update the relevant section of the book. I dropped the NNETAR model to get much better results.

I noticed the NNETAR model was now gone. I haven't read up to that section of the book yet, but presumably some assumptions for that model are not satisfied or the resulting residuals are correlated.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.