How to optimize phi damping parameter in trend component of the exponential smoothing state space ETS function of fable package time-series forecasting model?

lgirola · February 19, 2023, 9:06am

I have 32 months of data, and I'm testing models for forecasting unit transitions to dead state "X" for months 13-32, by training from transitions data for months 1-12. I then compare the forecasts with the actual data for months 13-32. The data represents the unit migration of a beginning population into the dead state over 32 months. Not all beginning units die off, only a portion. I understand that 12 months of data for model training isn't much and that forecasting for 20 months from those 12 months should result in a wide distribution of outcomes. I'm using Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice , 3rd edition, OTexts: Melbourne, Australia. Forecasting: Principles and Practice (3rd ed) as a resource text.

I'm working through section 8.2 Methods with trend, Figure 8.4, of the book with my data. The text states "We have set the damping parameter to a relatively low number (ϕ =0.90) to exaggerate the effect of damping for comparison" (value of phi parameter added to the trend component of the ETS function). I noticed that phi has a very large effect on fitting and resulting simulations.

In the code at the bottom I optimize phi in order to most closely align model forecasts for forecast months 13-32 with actual data for those months 13-32. The idea being, when faced with similar data and exponential decay curve for months 1-12 (and lacking data for months beyond month 12), I would use the same phi value for forecasting beyond month 12. Is this a "statistically valid" approach? Robust? Or is this an example of rookie overfitting? Should I even be toying around with the phi parameter in model fitting?

When I run simulations with the optimized phi value of 0.9757943, I get the following results which I am pleased with from my experience with similar curves:

Code:

library(dplyr)
library(fabletools)
library(fable)
library(feasts)
library(ggplot2)
library(tidyr)
library(tsibble)

DF <- data.frame(
  Month =c(1:32),
  StateX=c(
    9416,6086,4559,3586,2887,2175,1945,1675,1418,1259,1079,940,923,776,638,545,547,510,379,
    341,262,241,168,155,133,76,69,45,17,9,5,0)
) %>% 
  as_tsibble(index = Month)

myFunction <- function(x) {
  fit_A_Ad_N <- DF[1:12,] |> model(ETS(log(StateX) ~ error("A")+trend("Ad",phi = x)+season("N")))
  sim_A_Ad_N <- fit_A_Ad_N %>% generate(h = 20, times = 5000, bootstrap = TRUE)
  sim_A_Ad_N_DF <- as.data.frame(sim_A_Ad_N)
  agg_A_Ad_N <- sim_A_Ad_N_DF %>% group_by(.rep) %>% summarise(sum_FC = sum(.sim),.groups = 'drop')
  agg_A_Ad_N <- agg_A_Ad_N %>% as.data.frame()
  mean_A_Ad_N <- round(mean(agg_A_Ad_N[,"sum_FC"]),0)
  fc_actuals <- sum(as.data.frame(DF[13:32,2]))
  diff <- abs(mean_A_Ad_N - fc_actuals)
  return(diff)
}

f <- function(x) myFunction(x)
optimize(f, lower=0, upper=1)

^{Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos}

technocrat · February 20, 2023, 6:25am

It may help to think of this abstractly.

1.Choose a metric to measure the model, such as AIC
2. Apply the model multiple times varying the phi or phi-range argument to the trend() part of ETS
3. Select the model with the lowest AIC (for example)
4. Choose a metric, such as RMSE to evaluate the fit against the test set
5. Repeat with the next best AIC to see whether the fit is as good, better or worse

lgirola · February 20, 2023, 9:39am

Section 8.6 Estimation and model selection of the book suggests a method for ETS model selection by minimizing AIC, using fable package, as I demonstrate in the image below. Does this get at what you suggest in your response? It proposes ETS(M,Ad,N) and Phi = 0.8899..., which I circled below. If I run simulations with this and compare it to actuals for forecast periods 13-32, it looks rather "tight" compared to the wider deviations I usually experience with this sort of data. Loosening that damping parameter just a bit to 0.90 throws results back into the realm of my experience with outcomes for complete curves. Also, do you think my approach of pretending I have only 12 months of data is ok, or statistically hokey? I very often deal with limited data and don't have complete curves.

technocrat · February 20, 2023, 10:01am

I don't see the 12-month approach as pretending at all for the purpose of developing a model to be tested against later data. And there's nothing hokey about looking at how nudging \phi in the sixth decimal place makes the curve look so long as nothing else is swinging wildly. This is an exercise to find a useful model among the universe of models, all of which are wrong in terms of reality.

It looks like you've got a good grip on Hyndman's text and packages.

system · February 27, 2023, 10:01am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.