Top down Hierarchical time series forecast, Receiving NAs for high level series

EDITED FOR REPRODUCIBILITY

Good afternoon everyone, I'm encountering the following problem with the top-down forecasting proportions (tdfp) method for forecasting a hierarchical time series in R.

Background: 10.4 Top-down approaches | Forecasting: Principles and Practice (2nd ed)

My data has seven levels, which makes the bottom level series very wide so it's hard to post in this forum.

The neural networks method for forecasting top down proportions works very well, and produces bottom level time series data correctly. (I would like to put embedded media here but forum rules won't allow it.)

However the auto arima top down forecast produces NAs for the higher level data, while producing normal forecasts for the 5th level time series data and below. This is what that looks like:

If I remove the top-down stipulation from the hierarchical time series forecast code then I get a normal auto arima forecast. (I would like to put embedded media here but forum rules won't allow it.)

Has anyone encountered this problem before and, if so, what can I do to get the higher level forecast series to show up? Auto arima has typically been my most accurate forecasting method. Thank you in advance for your help.

Here is the code you can use to reproduce the error:

library(hts)
library(forecast)
library(readxl)
library(readr)

## This needs to be edited to point to the Github repository.

urlfile <- "https://raw.githubusercontent.com/myleshungerford/TopDownHierarchicalForecasting/main/SIS_Time_Series.csv"

SIS_Time_Series <- read_csv(url(urlfile))

View(SIS_Time_Series)

SIS_Start_Year <- 2017

freq <- 20

## These variables need to be updated/checked every time you run the model
Current_Year <- 2022
Current_cycle_month <- 5
current_month_name <- "FEB01"
## End of updated Variables

## These variables are derived, they do NOT need to be updated every month
Forecast_horizon <- (freq - Current_cycle_month)
nodes <- list(2, rep(2, 2), rep(2, 4), rep(2, 8), rep(2, 16), rep(2, 32))
## End of variables

## The following code will generate the alternate top-down time series for SIS

SISts <- ts(SIS_Time_Series, start = c(SIS_Start_Year, 1), end = c(Current_Year, Current_cycle_month), frequency = freq)

y <- hts(SISts, nodes = nodes)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) ets(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_ets_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp",FUN = function(x) hw(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_hw_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) stlf(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_stlf_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) nnetar(x, repeats = 200))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_nn_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) auto.arima(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_aa_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

Without a reproducible example, it will be hard to offer any help.

I have made edits for reproducibility.

Update 1:
I asked a colleague to run the code on their machine and it also produced NA's for AA.

I then asked another colleague to run the code on their machine and, strangely, it produced an AA forecast normally, without NA's for the higher level data.

This second colleague has an older version of R Studio, installed through Anaconda.
image

R version 3.6.1 (2019-07-05)

Many of the series consist entirely of zeros. So when the top-down proportions are being computed, there are 0/0 calculations leading to NaNs. Top-down proportions can't be used if you can't compute the proportional allocations throughout the hierarchy. If you use some other method, such as "wls", the results do not contain NaNs.

Why then does the AA method work alright on older versions of R?

Additionally, the wls method is in appropriate because what we're forecasting here is a marketing funnel. The wls method doesn't respond well to the fact that increases or decreases in higher level data usually lead to increases or decreases further down the funnel.

Prospects Started Submitted Completed Admitted Deposited Enrolled
2133 623 182 19 19 17 0
2583 929 368 19 18 17 0
3036 1289 781 19 18 17 0
3266 1500 1073 97 18 17 0
3411 1571 1138 788 18 17 0
3468 1602 1159 883 43 17 0
3537 1625 1173 993 874 26 0
3626 1660 1190 1025 934 32 0
3727 1686 1209 1052 964 84 3
3780 1707 1218 1067 983 164 34
3828 1732 1232 1073 995 231 155
3861 1753 1238 1084 1003 235 175
3897 1771 1243 1101 1018 242 191
3922 1783 1249 1110 1027 248 198
3939 1793 1250 1124 1040 255 211
3959 1808 1252 1127 1041 259 219
3973 1815 1255 1128 1043 262 225
3988 1824 1257 1133 1049 261 228
3995 1828 1258 1136 1051 261 228
4001 1834 1258 1130 1046 255 228
![image 464x421](upload://cQEWboighssgnRebd5PLyhR8To5.png)

I can't easily test old versions of the package or R, so I don't know how it was avoiding the 0/0 calculation.
You could get around it by changing the zeros to some small number (maybe add 1 to all values).

I don't understand your comment about wls being inappropriate. Any type of reconciliation should adjust weak lower level trends to match strong upper level trends. If the lower level trends are obvious in the data, they should be modelled by the base method in any case.

Perhaps I'm not understanding something regarding your suggestion to use wls. I get the following error when I switch to this code:

allf <- forecast(y, h = Forecast_horizon, method = "mls", FUN = function(x) auto.arima(x))

Error in match.arg(method) :
'arg' should be one of “comb”, “bu”, “mo”, “tdgsa”, “tdgsf”, “tdfp”

In any case, the problem is not that there is a weak lower level trend, the problem is that there is a weak higher level trend. The current prospects, starteds, and completeds are down by 30% compared to last year but this hasn't yet had an effect on the lowest level forecast yet when it should. AA tdfp is correctly forecasting a decrease in this lowest level of the marketing funnel, but not displaying the higher level data - only as NA's.

Confirmed that it appears to be an issue with the updated version of R.

Installed R 4.1.2, RStudio 2021.09.2, packages hts,forecast,readxl,readr. Files generated NA.
Installed R 3.6.1 & related packages/RTools. Files generated correctly.

Replacing all 0's with 1's appears to work fine as a temporary workaround:

Please read the help file. To use wls, set method="comb" and weights="wls". This is the default, so omitting both method and weights will give you WLS.

Thank you for the idea, but method ='comb' and weights = 'wls' still doesn't make the bottom of the marketing funnel responsive to increases or decreases in the top of the funnel.

Your idea of replacing the 0's with 1's appears to be a good workaround for now, although I don't know why tdfp would work fine with 0's in older versions of R but not the newer versions.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.