fable: Combination Models, Methodology and Simple Numerical Example

Hi

Further to fable: ensemble, extract sigma

There is the pdf: robjhyndman/quantile_ensembles: Book chapter on computing quantile forecasts for ensembles (github.com)

COMBINATION FORECASTING
You say p.4
“Quantiles can not simply be averaged, so we need to take account of the correlations between the forecast errors from the component models when producing quantile forecasts."

Mathematically, how is this being calculated? The forecast errors are: (fn(t)-actual(t)), for n component models OOS, and it's easy to compute the covariances or correlations between them. However, that would be ‘fitting’ to the OOS forecasts; and then there’s TOOS (true-out-of-sample) forecasts, i.e. forecasts past the last ‘actual’ entry, where forecast errors cannot be calculated.

The mean forecast should be easy as it’s a simple (or weighted) average. But exactly how are the Combination Forecasts PI’s being calculated? Please can you provide a simple numerical example.

Also in fable why are the forecast means exactly the same from both Ensemble and Combination? In the former they are calculated from a simulation, and the latter analytically.

And, what does this mean?
"Further improvement has been obtained by taking account of the similarity of the ETS and ARIMA forecasts, rather than simply combining the sample paths as with ensemble forecasting."

thanks,
Amarjit

The code for forecasts of combination models is defined in fabletools:::forecast.model_combination().

It calculates the cross-correlations between the models, and then uses var(x) + var(y) + 2*cov(x,y) for the variance. This only applies for Normal distribution forecasts, other distributions just default back to point forecasts.

This is currently somewhat limited, and I plan to extend this functionality with support for:

  1. Quantile ensembling
  2. Distribution mixture ensembling (distributional equivalent to "ENSEMBLE" in linked repo)
  3. Bootstrap samples (equivalent to "ENSEMBLE" in linked repo)

Hi Mitch,

Thanks for the reply. I am having difficulties in understanding! A reproducible 'By-Hand' numerical example based on data in fpp3 may help:

Let's say there are 2 models: arima210 and arima013, each forecasting 5 steps ahead out-of-sample (OOS).

AIM: I want to combine equally weighted.

#===============================
options(pillar.print_max = Inf)
options(pillar.width = Inf)
options(digits = 12)
options(pillar.signif = 12)

library(fpp3)
library(cowplot)
library(distributional)


train <- global_economy %>%
  filter(Code == "CAF") %>%
  filter(Year <= 2012)


fit <- train %>%
  model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
        arima013 = ARIMA(Exports ~ pdq(0,1,3)))

fit %>% select('arima210') %>% report()
fit %>% select('arima210') %>% tidy()
fit %>% select('arima210') %>% glance()
 
fit %>% select('arima013') %>% report()
fit %>% select('arima013') %>% tidy()
fit %>% select('arima013') %>% glance()


f_arima210 <- fit %>%
  forecast(h=5) %>%
  filter(.model=='arima210')
f_arima210

f_arima210$.mean

f_arima210.sigma <- rep(0, 5)
 for (i in 1:5) {	
 	f_arima210.sigma[i] <- t(matrix(unlist(f_arima210[[4]][i])))[2]
 	}  
f_arima210.sigma


f_arima013 <- fit %>%
  forecast(h=5) %>%
  filter(.model=='arima013')
f_arima013

f_arima013$.mean

f_arima013.sigma <- rep(0, 5)
 for (i in 1:5) {	
 	f_arima013.sigma[i] <- t(matrix(unlist(f_arima013[[4]][i])))[2]
 	}  
f_arima013.sigma


test <- global_economy %>%
  filter(Code == "CAF") %>%
  filter(Year > 2012)
test$Exports


p_arima210 <- f_arima210 %>%
  autoplot(bind_rows(train, test))
p_arima013 <- f_arima013 %>%
  autoplot(bind_rows(train, test))
plot_grid(p_arima210, p_arima013, labels = c('f_arima210', 'f_arima013'), ncol = 1)


fc.mean <- 0.5*f_arima210$.mean + 0.5*f_arima013$.mean
fc.mean

fe_arima210 <- f_arima210$.mean - test$Exports
fe_arima013 <- f_arima013$.mean - test$Exports
fe <- data.frame(fe_arima210,fe_arima013)  
COV <- cov(fe)
COV
COR <- cov2cor(COV)
COR

#Var(aX + bY) = a^2*Var(X) + b^2*Var(Y) + 2*a*b*COR(X,Y)*sqrt(Var(X))*sqrt(Var(Y))
a <- 0.5
b <- 0.5
fc.sigma <- sqrt( a^2*f_arima210.sigma^2 + b^2*f_arima013.sigma^2 + 2*a*b*COR[1,2]*f_arima210.sigma*f_arima013.sigma )
fc.sigma


f_arima210.sigma
f_arima013.sigma
COR
fc.sigma

#PI's
L80 <- fc.mean + qnorm(p=.1)*fc.sigma
U80 <- fc.mean + qnorm(p=.9)*fc.sigma
L95 <- fc.mean + qnorm(p=.025)*fc.sigma
U95 <- fc.mean + qnorm(p=.975)*fc.sigma
L80
U80
L95
U95


#===============================
#dynamic forecasts ARIMA, Combination 
f_D5 <- train %>%
  model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
        arima013 = ARIMA(Exports ~ pdq(0,1,3))
        ) %>%
  mutate(Combination = (arima210 + arima013)/2) %>%
  forecast(h = 5, point_forecast = lst(.mean = mean))
f_D5


f_D5 %>%
  hilo()


f_D5 %>%
  autoplot()


fcc.sigma <- rep(0, 15)
for (i in 1:15) {	
  fcc.sigma[i] <- t(matrix(unlist(f_D5[[4]][i])))[2]
}  
fcc.sigma[11:15]


#dynamic forecasts Ensemble
ensemble <- f_D5 %>%
  filter(.model %in% c("arima210", "arima013")) %>%
  summarise(
    Exports = dist_mixture(
      Exports[1], Exports[2],
      weights = c(0.5, 0.5)),
    .mean = mean(Exports),
    .sd = sqrt(variance(Exports))
  ) %>%
  mutate(.model = "Ensemble") %>%
  as_tibble() %>%
  select(.model,Year,Exports,.mean,.sd)
ensemble

I have calculated the combined forecast means: fc.mean, and both the the covariance and correlation matrices:

A combined summation variance including weighting a and b is:
Var(aX + bY) = a^2Var(X) + b^2Var(Y) + 2ab*COR(X,Y)*sqrt(Var(X))*sqrt(Var(Y)).

The results from 'By-Hand'

fc.sigma: [1] 2.66196163315 2.99746982500 3.33905502046 3.92779455129 4.39688110002

and from mutate(Combination = (arima210 + arima013)/2)

fcc.sigma[11:15]: [1] 2.63371832249 2.96566439305 3.30365549204 3.88632914027 4.35073920269

are different!

However, from fabletools:::forecast.model_combination(), there is the matrix calculation:

fc_cov <- map_dbl(fc_sd, function(sigma) (diag(sigma) %*% fc_cov %*% t(diag(sigma)))[1, 2])

Although I'm not certain how this is related to Var(aX + bY)?

I would be grateful for guidance.

Cheers,
Amarjit

Here's an example of doing the calculation 'by hand' - hope it helps.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✔ tibble      3.1.8          ✔ tsibble     1.1.3     
#> ✔ dplyr       1.0.10         ✔ tsibbledata 0.4.1     
#> ✔ tidyr       1.2.1          ✔ feasts      0.3.0     
#> ✔ lubridate   1.9.0          ✔ fable       0.3.2.9000
#> ✔ ggplot2     3.4.0
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date()    masks base::date()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval()  masks lubridate::interval()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ tsibble::setdiff()   masks base::setdiff()
#> ✖ tsibble::union()     masks base::union()
library(distributional)
fit <- as_tsibble(USAccDeaths) %>% 
  model(
    a = ARIMA(value ~ 0 + pdq(0,1,1) + PDQ(0,1,1)),
    b = ARIMA(value ~ 1 + pdq(1,0,1) + PDQ(0,1,1))
  ) %>% 
  mutate(comb = (a+b)/2)

# Pull out forecast distributions for calculations
fc_a <- fit %>% select(a) %>% forecast() %>% pull(value)
fc_b <- fit %>% select(b) %>% forecast() %>% pull(value)
fc_comb <- fit %>% select(comb) %>% forecast() %>% pull(value)

# Obtain correlation for component model residuals
resid <- fit %>% 
  select(a,b) %>% 
  residuals() %>% 
  pivot_wider(names_from = ".model", values_from = ".resid") %>% 
  with(cbind(a,b))
cor <- cov2cor(var(resid))

# Compute combined (a+b) forecast variance
sigma <- sqrt(cbind(variance(fc_a), variance(fc_b)))
fc_var <- diag(sigma %*% cor %*% t(sigma))

# Compute combined (a+b) forecast mean
fc_mu <- mean(fc_a) + mean(fc_b)

# Create combined (a+b)/2 forecast distribution
fc_comb_manual <- dist_normal(fc_mu, sd = sqrt(fc_var))/2

waldo::compare(fc_comb, fc_comb_manual, tolerance = 1e-10)
#> `attr(old, 'vars')` is a character vector ('value')
#> `attr(new, 'vars')` is absent

Created on 2023-01-08 with reprex v2.0.2

Thanks!

For the Exports example, the correlation matrix should be from the in-sample residuals, to get the same results as in fable:

# correlation for 2 component models from IN-SAMPLE residuals
 resid <- fit %>% 
   select('arima210','arima013') %>% 
   residuals() %>% 
   pivot_wider(names_from = ".model", values_from = ".resid") %>% 
   with(cbind(arima210,arima013))
 resid

 COR <- cov2cor(var(resid))
 COR

> fc.sigma
[1] 2.63371832249 2.96566439305 3.30365549204 3.88632914027 4.35073920269

> fcc.sigma[11:15]
[1] 2.63371832249 2.96566439305 3.30365549204 3.88632914027 4.35073920269

>  waldo::compare(fc.sigma, fcc.sigma[11:15], tolerance = 1e-10)
✔ No differences

A few more questions:

(A) How is:

Var(aX + bY) = a^2Var(X) + b^2Var(Y) + 2ab*COR(X,Y)*sqrt(Var(X))*sqrt(Var(Y)

GENERALLY equivalent to

(diag(sigma) %*% fc_cov %*% t(diag(sigma)) ?

And being applicable for many models, e.g. for combining 3 models I would use

Var(aX + bY + cZ) = a^2Var(X) + b^2Var(Y) + c^2Var(Z) +
2ab*COR(X,Y)*sqrt(Var(X))sqrt(Var(Y)) +
2ac
COR(X,Z)*sqrt(Var(X))sqrt(Var(Z)) +
2bc
COR(Y,Z)*sqrt(Var(Y))*sqrt(Var(Z))

Etc, Etc, Right?

(B) And from the initial post: in fable why are the forecast means exactly the same from both Ensemble and Combination? In the former they are calculated from a MC simulation, and the latter analytically. In other-words, shouldn't there always be some experimental error with simulations?

Amarjit

(A): They are not equivalent. The matrix variance calculation is for Var(X+Y) and the scalars are already applied to X and Y. The forecast method here is recursively computed, so it would first compute aX and bX (simply scaling the models) and then it computes the sum.

(B): They shouldn't be the same, for reasons that you indicate. Can you provide a minimal reproducible example?

(A): In the Exports example

# Method 1
# -------- 
#Var(aX + bY) = a^2*Var(X) + b^2*Var(Y) + 2*a*b*cor(X,Y)*sqrt(Var(X))*sqrt(Var(Y))
 a <- 0.5
 b <- 0.5
 fc.sigma <- sqrt( a^2*f_arima210.sigma^2 + b^2*f_arima013.sigma^2 + 2*a*b*COR[1,2]*f_arima210.sigma*f_arima013.sigma )
 fc.sigma

# Method 2
# --------
#make matrix of forecast sigmas
 msigma <- data.frame(f_arima210.sigma, f_arima013.sigma)
 msigma <- data.matrix(msigma)
 msigma
 
#SAME AS fc.sigma
 fc.var <- msigma %*% COR %*% t(msigma)
 fc.var.diag <- diag(fc.var)
 fc.var.sigma <- sqrt(fc.var.diag)/2
 fc.var.sigma

#compare
 waldo::compare(fc.sigma, fc.var.sigma, tolerance = 1e-10) 

These are the same.

Method 1: Requires the forecast sigmas, the correlation matrix from the in-sample residuals, and the weights 0.5's.
Method 2: The scaling is the division by 2, Correct?

(B): In Rob's auscafe example GitHub - robjhyndman/quantile_ensembles: Book chapter on computing quantile forecasts for ensembles, they are not the same.

#=============================== 
 fcasts_C <- fit %>%
   forecast(h = "1 year") %>%
   bind_rows(ensemble) %>%
   filter(.model == "COMBINATION") 
 
 fcasts_E <- fit %>%
   forecast(h = "1 year") %>%
   bind_rows(ensemble) %>%
   filter(.model == "ENSEMBLE") 
 
 waldo::compare(fcasts_C$.mean, fcasts_E$.mean, tolerance = 1e-10)
`old`: 3854.49 3526.39 3879.44 3819.13 3844.06 3745.64 3960.79 4011.49 3983.66 4091.28 and 2 more...
`new`: 3853.82 3525.36 3879.37 3818.93 3843.94 3746.05 3961.55 4012.01 3984.11 4091.38           ...

However, I am trying on other data and they are the same. Here's a minimal reproducible example

#===============================
#options(max.print = .Machine$inD13teger.max)

options(pillar.print_max = Inf)
options(pillar.width = Inf)
options(digits = 12)
options(pillar.signif = 12)

library(fpp3)
library(distributional)

#===============================
algeria_economy <- global_economy %>%
  filter(Country == "Algeria")

# dynamic forecasts ETS, ARIMA, Combination
fc <- algeria_economy %>%
  model(
    ETS = ETS(Exports),
    ARIMA_SEARCH = ARIMA(Exports, ic = c("bic"), method = "CSS-ML", optim.control = list(maxit = 1000)
    )
  ) %>%
  mutate(Combination = (ETS + ARIMA_SEARCH) / 2) %>%
  forecast(h = 10, point_forecast = lst(.mean = mean))
fc

# filter Combination
Comb <- fc %>% filter(.model == "Combination")
Comb
Comb$.mean

# dynamic forecasts Ensemble
ensemble <- fc %>%
  filter(.model %in% c("ETS", "ARIMA_SEARCH")) %>%
  summarise(
    Exports = dist_mixture(Exports[1], Exports[2],
                           weights = c(0.5, 0.5)),
    .mean = mean(Exports),
    .sd = sqrt(variance(Exports))
  ) %>%
  mutate(.model = "Ensemble") %>%
  as_tibble() %>%
  select(.model, Year, Exports, .mean)
ensemble

# compare
waldo::compare(Comb$.mean, ensemble$.mean, tolerance = 1e-20)
#✔ No differences

Hi Mitch,

I have tried on other data sets in addition to the above examples, and the means are the same, also, for both the ENSEMBLE and COMBINATION!

I'd very much appreciate a response.

thanks,
Amarjit