How do you do time-series forecasting? Best practices, tidy ways etc

Every time I need to do a time-series forecasting, I use the tool that I know well: the forecast package and functions from there (most commonly ets).

The problem is that I find it to be a pain in the ass to work with ts and xts objects and go out of my way to construct a model.

There is gotta be a better way!

How do you do time-series forecasting? What are your best practices?

Check out sweep:

Actually, the posts on bizScience in general are great resources. @davis can speak to this better than I can, but tibbletime, and timetk might also be helpful.

2 Likes

This is exactly the reason why our initial package, tidyquant, was built. Everything since then has been a step in the direction of making conversion (xts to tibble to ts and back with timetk) and usage (forecast then tidy with sweep) easier for the end user.

Currently, sweep and timetk (formerly timekit in that post above) should help you out if you use forecast a lot. In the future, we hope to build an extension to tibbletime that plays nicely with the forecast package directly.

2 Likes

Thanks!
Yes, I read through all of those. Sweep is just a broom for time-series. tibbletime and timetk are helpful, but you still have to do all the acrobatic moves of jumping into ts objects and all that. I probably need to spend more time studying the guides...

You are correct that there is currently no solution that doesn't require you to change to ts or xts at some point to perform the forecast.

That being said, you can kind of stay in the tidyverse if you use list columns and keep your original tibble and your ts objects in different columns. This especially works well if you have multiple models you want to run on different groups. This vignette explains.

2 Likes

Yeah, that's my biggest pain point. For some reason, I just hate these manipulations. Well, hoping for the best, and will definitely investigate the above mentioned packages more

Yes I do use nested data frames and nest my ts objects into columns. It becomes tricky as I forecast revenue for 30+ branches, and some branches have different start date, and it makes it more difficult to create valid ts objects, but that's the pain I have to deal with :slight_smile:

If you reallllllllly want to....

It's kind of annoying that most forecast package functions are not generics. This makes it difficult for us to extend the package to data.frames and other new objects.

library(tibbletime)
library(dplyr)
library(forecast)

# as.ts() is generic, let's go ahead and make a tbl_time method
as.ts.tbl_time <- function(x, ...) {
  index_quo <- tibbletime::get_index_quo(x)
  index_col <- tibbletime::get_index_col(x)
  
  x_no_index <- dplyr::select(x, - !! index_quo)
  
  start <- dplyr::first(index_col)
  end   <- dplyr::last(index_col)
  
  ts(data = x_no_index, start = start, end = end, ...)
}

# Every forecast package function is not a generic for some reason :(
# Force it to be one anyways for the sake of the example
ets <- function(y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
    gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
    biasadj = FALSE, lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), 
    opt.crit = c("lik", "amse", "mse", "sigma", "mae"), nmse = 3,
    bounds = c("both", "usual", "admissible"), ic = c("aicc", "aic", "bic"),
    restrict = TRUE, allow.multiplicative.trend = FALSE,
    use.initial.values = FALSE, ...) {
  UseMethod("ets")
}

# The default just calls the forecast version directly
ets.default <- function(y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
                        gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
                        biasadj = FALSE, lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), 
                        opt.crit = c("lik", "amse", "mse", "sigma", "mae"), nmse = 3,
                        bounds = c("both", "usual", "admissible"), ic = c("aicc", "aic", "bic"),
                        restrict = TRUE, allow.multiplicative.trend = FALSE,
                        use.initial.values = FALSE, ...) {
  
  forecast::ets(y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
                gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
                biasadj = FALSE, lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), 
                opt.crit = c("lik", "amse", "mse", "sigma", "mae"), nmse = 3,
                bounds = c("both", "usual", "admissible"), ic = c("aicc", "aic", "bic"),
                restrict = TRUE, allow.multiplicative.trend = FALSE,
                use.initial.values = FALSE, ...)
  
}

# The method for tbl_time first coerces to ts
ets.tbl_time <- function(y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
                         gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
                         biasadj = FALSE, lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), 
                         opt.crit = c("lik", "amse", "mse", "sigma", "mae"), nmse = 3,
                         bounds = c("both", "usual", "admissible"), ic = c("aicc", "aic", "bic"),
                         restrict = TRUE, allow.multiplicative.trend = FALSE,
                         use.initial.values = FALSE, ...) {
  
  # Pass ... to as.ts() so you can control frequency
  # May or may not work / be useful
  y <- as.ts(y, ...)
  
  forecast::ets(y, model = "ZZZ", damped = NULL, alpha = NULL, beta = NULL,
                gamma = NULL, phi = NULL, additive.only = FALSE, lambda = NULL,
                biasadj = FALSE, lower = c(rep(1e-04, 3), 0.8), upper = c(rep(0.9999, 3), 0.98), 
                opt.crit = c("lik", "amse", "mse", "sigma", "mae"), nmse = 3,
                bounds = c("both", "usual", "admissible"), ic = c("aicc", "aic", "bic"),
                restrict = TRUE, allow.multiplicative.trend = FALSE,
                use.initial.values = FALSE)
}

# Let's try it out
data(FB)

FB %>%
  select(date, adjusted) %>%
  as_tbl_time(date) %>%
  ets()
#> ETS(M,A,N) 
#> 
#> Call:
#>  forecast::ets(y = y, model = "ZZZ", damped = NULL, alpha = NULL,  
#> 
#>  Call:
#>      beta = NULL, gamma = NULL, phi = NULL, additive.only = FALSE,  
#> 
#>  Call:
#>      lambda = NULL, biasadj = FALSE, lower = c(rep(1e-04, 3),  
#> 
#>  Call:
#>          0.8), upper = c(rep(0.9999, 3), 0.98), opt.crit = c("lik",  
#> 
#>  Call:
#>          "amse", "mse", "sigma", "mae"), nmse = 3, bounds = c("both",  
#> 
#>  Call:
#>          "usual", "admissible"), ic = c("aicc", "aic", "bic"),  
#> 
#>  Call:
#>      restrict = TRUE, allow.multiplicative.trend = FALSE, use.initial.values = FALSE) 
#> 
#>   Smoothing parameters:
#>     alpha = 0.9999 
#>     beta  = 1e-04 
#> 
#>   Initial states:
#>     l = 28.0112 
#>     b = 0.0932 
#> 
#>   sigma:  0.0307
#> 
#>      AIC     AICc      BIC 
#> 12491.56 12491.60 12517.99

Created on 2018-01-18 by the reprex package (v0.1.1.9000).

1 Like

This is actually pretty nifty (not a full reprex, needs the above code block too)

library(tidyr)
library(purrr)

data(FANG)

FANG_nest <- FANG %>%
  select(symbol, date, adjusted) %>%
  group_by(symbol) %>%
  as_tbl_time(date) %>% 
  nest()

FANG_nest
#> # A tibble: 4 x 2
#>   symbol data                
#>   <chr>  <list>              
#> 1 FB     <tibble [1,008 × 2]>
#> 2 AMZN   <tibble [1,008 × 2]>
#> 3 NFLX   <tibble [1,008 × 2]>
#> 4 GOOG   <tibble [1,008 × 2]>

FANG_nest %>%
  mutate(model = map(data, ets))
#> # A tibble: 4 x 3
#>   symbol data                 model    
#>   <chr>  <list>               <list>   
#> 1 FB     <tibble [1,008 × 2]> <S3: ets>
#> 2 AMZN   <tibble [1,008 × 2]> <S3: ets>
#> 3 NFLX   <tibble [1,008 × 2]> <S3: ets>
#> 4 GOOG   <tibble [1,008 × 2]> <S3: ets>
3 Likes

If you typically use ets() then I presume you are most interested in exponential smoothing SSMs (or something similar)? FYI, you can fit related, but not exactly the same, SSMs with the MARSS package, which only requires the data to be of class matrix without any conversion to a ts object. Although designed for multivariate time series, many univariate models can be fit and forecasted as well (e.g., [un]biased random walks, DLMs). If interested, check out the vignettes in here.

1 Like

Recently Facebook has released an open source library for time series forecasting, calling it - with characteristic immodesty, "prophet" - https://facebook.github.io/prophet/.

It is also available as an R package on CRAN, and seems to work nicely with data frames. You may want to give it a try.

1 Like

Very interesting! I'll definitely try it out!