Hyndman's recipe
library(forecast)
retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
fcast[,i] <- forecast(retail[,i],h=h)$mean
write(t(fcast),file="retailfcasts.csv",sep=",",ncol=ncol(fcast))
requires understanding several steps. Since the 2013 post, there are new tools to make this easier
Every R
problem can be thought of with advantage as the interaction of three objects: an existing object, x , a desired object, y , and a function, f, that will return a value of y given x as an argument.
f(x)=y
Any or all of these three objects (in R
, everything is an object) may contain other objects, including functions. We say that functions are composable, like f(g(x)).
In this case x is a composite of the 2,000 products and their respective 36 element time series. y is a composite of 2,000 time series models, perhaps including results of forecasting against a held-out portion of the data.
Before attempting to do anything 2,000 times at once, it is preferable to design a function f to do it once.
This begins with extracting an object for a single SKU from x. Each SKU is in its own row, a vector, containing a presumably character vector of the SKU identifer and a numeric vector of length 32. As toy data
SKU <- "blue towel"
dat <- seq(1:36)
(dat
in preference to data
because the latter is a built-in function and some operations give precedence to the later.)
Assuming that x is in the global environment as a data frame SKU
sku <- SKU[1,2:37]
The subset operator is row first, column second. The lowercase name is intentional—it can be reused for every other row, since only one-at-a-time is involved.
This is the first opportunity to make a function
pick_one <- function(x) SKU[x,2:37]
and this revises the corresponding line above
sku <- pick_one(1)
Next, a function to convert dat
into a time series object.
mk_ts <- function() ts(dat, start = c(2018,1), frequency = 12)
In a non-toy example, there would here be some error trapping for gaps, etc.
series <- mk_ts()
The next step is to model series
with one or more of the baseline models, NAIVE, MEAN, DRIFT, RW, SNAIVE
. These are from {fpp3}. The reason for that package will become obvious when when it comes to running multiple models in one go:
mk_model <- function() fit <- series %>% model(Naive = NAIVE(series))
the_model <- mk_model()
This requires {dplyr} or {magrittr} to be loaded, for the %>%
pipe operator.
Between mk_ts()
and mk_object
, there should be, of course, diagnostics for autocorrelation, trend, which will have a bearing on choice of model, and program flow logic based those results. Also consider inflation adjustments, if applicable, as well as log transformation or differencing. The best way to understand these is to work through the examples in Hyndman's book thoroughly for a single SKU and then pick another few at random.
Given a model, the residual diagnostics must be considered. See Hyndman § 5.7.
When this is complete, thought should be given as to what model features should be captured and the appropriate object to contain them.
There is no golden road to forecasting with even a single series. Although there are tests, not all of these should be automated because application may require judgment. Do not attempt to fully automate even model creation without an informed understanding of all the required steps to make forecasts that can be defended.