forecasting with tsibble and fable

Hi,

I am new to tsibble and fable and trying to perform time series forecasting using fable.
I have a dataset with 4 variables - Date (as 2020 01), Location(as USA, Canada, and 6 more), Brands(as A, B, C, D), and Sales(in quantities)

I first converted this data frame to a tsibble object as follows:

df <- df%>%
    mutate(Date = yearmonth(as.character(Date)))%>%
    as_tsibble(key = c(Location, Brands), index = Date)

After this I tested out modeling, forecasting (5 years) and plotting forecast for just one time series as Brand A and USA. It worked fine including the forecast plots. However, when I try to model all time series at once with the following code, it works partially showing the mable as normal, but also throws error before displaying the mable as shown below.

Code:

fit <- df %>%
  model(
    snaive = SNAIVE(Sales ~ lag("year")),
    ets = ETS(Sales),
    arima = ARIMA(Sales)
  )
fit

Error:
8 errors (1 unique) encountered for snaive
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.
8 errors (1 unique) encountered for ets
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.
8 errors (1 unique) encountered for arima
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.

I have checked the data and I don't find any missing values at all.
Because of this error, I believe the forecast are not performed right - the plot of forecast is also flat.

Can you please help? Thank you!

Temporal missingness (gaps) can be found using the tsibble::scan_gaps() function. If this returns a non-empty tsibble, then your data contains gaps which the above models are not designed to handle.

This may also happen when you misrepresent the temporal structure of your data. If you have monthly data, you should use a yearmonth() to represent it. Using dates will imply the values are taken for the 1st of each month, with every other day in that month missing.

Thanks @mitchelloharawild!

I am still not very clear on this concept. tsibble::scan_gaps(df) doesn't show any empty data at all. With regard to monthly data, I am using yeatmonth() as mentioned earlier with the following code and Date is just the variable name:

df <- df%>%
    mutate(Date = yearmonth(as.character(Date)))%>%
    as_tsibble(key = c(Location, Brands), index = Date)

I am not sure what I am doing wrong. But I am very new to this time series in general. Would you recommend to go with some other package instead of fable. I was trying to learn fable thinking it is tidy way of forecasting.

Any guidance will be appreciated! Thank you!

Without knowing what df contains, it is difficult to find why this error is being triggered.
The code which checks for gaps is:

any(tsibble::has_gaps(x)[[".gaps"]])

What does tsibble::has_gaps(df) return?

Thanks @mitchelloharawild!
Below is the small dataset I have created from my original data. I am getting different error while modeling all time series...but I am getting error in the same place. With my entire dataset being used, I get the below error. But when I try to model only a specific brand and location with original data, no errors are seen and a nice forecasting plot was also made. But I tried only with one particular time series.
Error with original or entire dataset:
8 errors (1 unique) encountered for snaive
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.
8 errors (1 unique) encountered for ets
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.
8 errors (1 unique) encountered for arima
[8] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using tsibble::fill_gaps() if required.

library(tidyverse)
library(readxl)
library(tsibble)
library(fable)
#> Loading required package: fabletools
library(urca)

# Sample dataset with just first few rows
df <- data.frame(
  stringsAsFactors = FALSE,
                                Date = c("01/01/2015","01/01/2015",
                                         "01/01/2015","01/01/2015","01/01/2015",
                                         "01/01/2015","01/01/2015","01/01/2015",
                                         "01/01/2015","01/01/2015"),
                            Location = c("Brazil","Brazil","Brazil",
                                         "Brazil","Canada","Canada","Canada",
                                         "Canada","Ecuador","Ecuador"),
                              Brands = c("B","C","G","O","B","C","G",
                                         "O","B","C"),
                               Sales = c(86056,76479,140247,116197,981,
                                         54881,46822,-6491,16525,17043)
                  )
# Separating Date into Year and Month
df <- df%>%
    separate(Date, into = c("Month", "Day", "Year"), sep = '/')%>%
    select(-Day)%>%
    unite(Date, Year, Month, sep = " " )

## Creating tsibble object
df <- df%>%
    mutate(Date = yearmonth(as.character(Date)))%>%
    as_tsibble(key = c(Location, `Brands`), index = Date)

# Modeling all the series
fit <- df %>%
  model(
    snaive = SNAIVE(Sales ~ lag("year")),
    ets = ETS(Sales),
    arima = ARIMA(Sales)
  )
#> Warning: 10 errors (1 unique) encountered for snaive
#> [10] invalid 'times' argument
#> Warning: 10 errors (1 unique) encountered for ets
#> [10] only 1 case, but 2 variables
#> Warning: 10 errors (1 unique) encountered for arima
#> [10] missing value where TRUE/FALSE needed
fit
#> Warning: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `needs_dots`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?
#> # A mable: 10 x 5
#> # Key:     Location, Brands [10]
#>    Location Brands       snaive          ets        arima
#>    <chr>    <chr>       <model>      <model>      <model>
#>  1 Brazil   B      <NULL model> <NULL model> <NULL model>
#>  2 Brazil   C      <NULL model> <NULL model> <NULL model>
#>  3 Brazil   G      <NULL model> <NULL model> <NULL model>
#>  4 Brazil   O      <NULL model> <NULL model> <NULL model>
#>  5 Canada   B      <NULL model> <NULL model> <NULL model>
#>  6 Canada   C      <NULL model> <NULL model> <NULL model>
#>  7 Canada   G      <NULL model> <NULL model> <NULL model>
#>  8 Canada   O      <NULL model> <NULL model> <NULL model>
#>  9 Ecuador  B      <NULL model> <NULL model> <NULL model>
#> 10 Ecuador  C      <NULL model> <NULL model> <NULL model>

Created on 2020-07-16 by the reprex package (v0.3.0)

Does it work when you use:

df %>%
  fill_gaps() %>%
  model(
    snaive = SNAIVE(Sales ~ lag("year")),
    ets = ETS(Sales),
    arima = ARIMA(Sales)
  )

Note that ETS will not work because it does not handle missing values that fill_gaps() will introduce.

The reprex was not as helpful as it could be given that each of the models was estimated with just one observation (2015 Jan).

Just to confirm, you have complete monthly data for each of the 32 location-brand combinations? Is the number of rows in the tsibble 32 x # of months?

You were able to use the entire dataset to estimate one of the models for one of the combinations, but there could still be missing data for some of the others. For the estimation using the entire set, I am curious about the 8 identical errors for each method, which corresponds to the number of locations.

BTW, there is no reason for the Separating Date into Year and Month code. yearmonth(Date) will translate "01/01/2015" to "2015 Jan" without you removing the day.

Disclaimer: I am still working my way up to the intermediate level for the tidyverts, so my questions and comments may not be very useful.

Thanks @mitchelloharawild! Yes it works now and does give errors for ETS as you mentioned.

However, if I am unable to understand the concept of gaps(). The data itself doesn't have any missing values. Is it compromising some of the data in place or is it considering some of the zero values as missing values.

Anyhow, thanks for your help on fixing this!

Thanks @EconProf! I am at the beginner level still, so errors give me hard time.

Thanks for guiding me on leaving Date as is. It was unnecessary step which I didn't know earlier. So, thanks for helping, it reduces some coding :slight_smile:

A gap isn't an explicit missing value as in NA, it will be an implicit missing value.

For example, Jan -> Feb -> Mar -> Apr -> May -> Jun.
If April is missing entirely from that sequence, we consider it a 'gap' (implicitly missing observation). A gap like this is occurring somewhere in your data (tsibble::has_gaps() can help you find this).

Using the fill_gaps() function will add these missing rows into your data, and convert the gaps into explicit missing values (NA).

I see! Thanks for explaining this to me @mitchelloharawild!

I do see some of the brands for which some months are entirely missing. Probably because there was no sales at all. Can we include 0 instead of NA with fill_gaps()? It didn't take that value for me. But I am thinking 0 is ideal for no sales and also I can get to see ETS in that case.

You can specify the missing values by parsing expressions to ....

For example, filling the kilo column with 0 values for gaps:

library(tsibble)
harvest <- tsibble(
  year = c(2010, 2011, 2013, 2011, 2012, 2014),
  fruit = rep(c("kiwi", "cherry"), each = 3),
  kilo = sample(1:10, size = 6),
  key = fruit, index = year
)

harvest %>%
  fill_gaps(kilo = 0L)
#> # A tsibble: 8 x 3 [1Y]
#> # Key:       fruit [2]
#>    year fruit   kilo
#>   <dbl> <chr>  <int>
#> 1  2011 cherry     7
#> 2  2012 cherry     9
#> 3  2013 cherry     0
#> 4  2014 cherry     2
#> 5  2010 kiwi       3
#> 6  2011 kiwi      10
#> 7  2012 kiwi       0
#> 8  2013 kiwi       4

Created on 2020-07-20 by the reprex package (v0.3.0)

Thank you @mitchelloharawild! You have been very helpful! Much appreciated!

After forecasting time series using fable, the result is in the form of a fable object with distribution. How do we save these results in a dataframe. Thanks!

I get the following error:

Error: Can't convert to .
Run rlang::last_error() to see where the error occurred.

A fable is a dataframe. You can drop the fable attributes using as_tibble().
Please provide a reproducible example for your error.

Thanks @mitchelloharawild! I have run same coding with below example on housing data:

# Libraries
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
library(forecast)

# Selecting Country and Unemployment
UnEmp <- tsibbledata::hh_budget%>%
  select(Year, Country, Unemployment)

#Plotting data
UnEmp%>%
    autoplot(Unemployment)

# Train-test split
train <- UnEmp %>%
  filter(Year <= 2010)

#Modeling 
fit <- train %>%
    model(
    snaive = SNAIVE(Unemployment ~ lag("year")),
    ets = ETS(Unemployment),
    arima = ARIMA(Unemployment)
  )%>%
  mutate(mixed = (ets + arima + snaive) / 3)

fit

#Forecasting
fc <- fit %>%
  forecast(h = 6) %>%
  autoplot(UnEmp, level = NULL)

# Accuracy Testing
accuracy(fc, UnEmp)

fc_accuracy <- accuracy(fc, UnEmp,
  measures = list(
    point_accuracy_measures,
    interval_accuracy_measures,
    distribution_accuracy_measures
  )
)

fc_accuracy %>%
  group_by(.model) %>%
  summarise(
    RMSE = mean(RMSE),
    MAE = mean(MAE),
    MASE = mean(MASE),
    Winkler = mean(winkler),
    CRPS = mean(CRPS)
  ) %>%
  arrange(RMSE)

In this example, average and snaive are giving NaN values. But in my original data set, average gives the best result and there are no NaN values in results. I would like to take the best result, in this example, arima as my final model and would like to apply it on unseen future 3 years (2017 - 2019) and then save the forecasted values in a new data frame.

Thanks for your help!

The SNAIVE model is failing because your data isn't seasonal (data observed annually in this case).

Warning message:
4 errors (1 unique) encountered for snaive
[4] Non-seasonal model specification provided, use RW() or provide a different lag specification.

Because of this, your combination model (mixed) is averaging an errored model, and also producing bad results.

Note also in your code above you are saving the autoplot() to fc:

fc <- fit %>%
  forecast(h = 6) %>%
  autoplot(UnEmp, level = NULL)

You should save the forecasts to fc so that your accuracy() and other functions work as expected:


fc <- fit %>%
  forecast(h = 6)

fc %>%
  autoplot(UnEmp, level = NULL)

This looks like it is on a different topic. Please ask a new question. I suggest you provide a reproducible example of the problem so people are able to provide help. As stated, there is little anyone can do.

Thanks @mitchelloharawild!

This data was taken just for example purpose. But my original data doesn't show errors with SNAIVE model as my data is seasonal as you have mentioned. I have now saved only forecast value to fc as you have suggested. I am getting all the results as mentioned earlier also. But I am struggling with the following:

  1. Average gives the best result in my original data. I would like to take this best result as my final model and would like to apply it on unseen future 3 years (2017 - 2019 for the above example) and then save the forecasted values in a new data frame.

  2. I am not able to save the forecasted values in a separate dataframe which I would like to save as an excel or csv file. I believe .model and .distribution in fable object is causing this problem

  3. For the above example on housing data, if we were to select the arima model, then how would we apply that model onto next 3 years and then save in a dataframe or excel or csv version? Can you please show with example code.

Thank you so much for your help!

  1. This should be possible using the code you've used above. The errors will be treated as warnings to allow other models to continue without stopping.

  2. You should store the forecasts in a flat structure to save to a csv or excel file. The problematic column here would be your distribution. By default, it will store this complex structure as text, which will lose some detail. If you want, you can also extract the forecast mean using mean(<your distribution column>), and variance with variance(<your distribution column>). This will likely require updating your packages for this feature to work.

  3. Below. Note how the distribution column is now a character class, because the information is lost. You can use readr::write_rds() and readr::read_rds() to store the distributions, however this file format can only be used with R.

# Libraries
library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
#> Loading required package: fabletools
#> Registered S3 methods overwritten by 'fabletools':
#>   method      from 
#>   glance.NULL broom
#>   tidy.NULL   broom
# library(forecast) # Forecast package is not needed here.

# Selecting Country and Unemployment
UnEmp <- tsibbledata::hh_budget%>%
  select(Year, Country, Unemployment)

path <- tempfile()
UnEmp %>% 
  model(ARIMA(Unemployment)) %>% # Automatic ARIMA model
  forecast(h = "3 years") %>% # Forecast 3 years ahead
  write_csv(path) # Write to a csv

# Read in the CSV
read_csv(path)
#> Parsed with column specification:
#> cols(
#>   Country = col_character(),
#>   .model = col_character(),
#>   Year = col_double(),
#>   Unemployment = col_character(),
#>   .mean = col_double()
#> )
#> # A tibble: 12 x 5
#>    Country   .model               Year Unemployment .mean
#>    <chr>     <chr>               <dbl> <chr>        <dbl>
#>  1 Australia ARIMA(Unemployment)  2017 N(5.7, 0.24)  5.71
#>  2 Australia ARIMA(Unemployment)  2018 N(5.7, 0.48)  5.71
#>  3 Australia ARIMA(Unemployment)  2019 N(5.7, 0.73)  5.71
#>  4 Canada    ARIMA(Unemployment)  2017 N(7.2, 0.37)  7.20
#>  5 Canada    ARIMA(Unemployment)  2018 N(7.4, 0.8)   7.40
#>  6 Canada    ARIMA(Unemployment)  2019 N(7.5, 1)     7.53
#>  7 Japan     ARIMA(Unemployment)  2017 N(3.1, 0.12)  3.10
#>  8 Japan     ARIMA(Unemployment)  2018 N(3.3, 0.37)  3.29
#>  9 Japan     ARIMA(Unemployment)  2019 N(3.4, 0.51)  3.44
#> 10 USA       ARIMA(Unemployment)  2017 N(4.9, 0.6)   4.91
#> 11 USA       ARIMA(Unemployment)  2018 N(5.2, 1.6)   5.23
#> 12 USA       ARIMA(Unemployment)  2019 N(5.6, 2.3)   5.61

Created on 2020-07-29 by the reprex package (v0.3.0)