Top down Hierarchical time series forecast, Receiving NAs for high level series

myles · February 2, 2022, 9:32pm

EDITED FOR REPRODUCIBILITY

Good afternoon everyone, I'm encountering the following problem with the top-down forecasting proportions (tdfp) method for forecasting a hierarchical time series in R.

Background: 10.4 Top-down approaches | Forecasting: Principles and Practice (2nd ed)

My data has seven levels, which makes the bottom level series very wide so it's hard to post in this forum.

The neural networks method for forecasting top down proportions works very well, and produces bottom level time series data correctly. (I would like to put embedded media here but forum rules won't allow it.)

However the auto arima top down forecast produces NAs for the higher level data, while producing normal forecasts for the 5th level time series data and below. This is what that looks like:

If I remove the top-down stipulation from the hierarchical time series forecast code then I get a normal auto arima forecast. (I would like to put embedded media here but forum rules won't allow it.)

Has anyone encountered this problem before and, if so, what can I do to get the higher level forecast series to show up? Auto arima has typically been my most accurate forecasting method. Thank you in advance for your help.

Here is the code you can use to reproduce the error:

library(hts)
library(forecast)
library(readxl)
library(readr)

## This needs to be edited to point to the Github repository.

urlfile <- "https://raw.githubusercontent.com/myleshungerford/TopDownHierarchicalForecasting/main/SIS_Time_Series.csv"

SIS_Time_Series <- read_csv(url(urlfile))

View(SIS_Time_Series)

SIS_Start_Year <- 2017

freq <- 20

## These variables need to be updated/checked every time you run the model
Current_Year <- 2022
Current_cycle_month <- 5
current_month_name <- "FEB01"
## End of updated Variables

## These variables are derived, they do NOT need to be updated every month
Forecast_horizon <- (freq - Current_cycle_month)
nodes <- list(2, rep(2, 2), rep(2, 4), rep(2, 8), rep(2, 16), rep(2, 32))
## End of variables

## The following code will generate the alternate top-down time series for SIS

SISts <- ts(SIS_Time_Series, start = c(SIS_Start_Year, 1), end = c(Current_Year, Current_cycle_month), frequency = freq)

y <- hts(SISts, nodes = nodes)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) ets(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_ets_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp",FUN = function(x) hw(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_hw_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) stlf(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_stlf_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) nnetar(x, repeats = 200))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_nn_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

allf <- forecast(y, h = Forecast_horizon, method = "tdfp", FUN = function(x) auto.arima(x))
fcdf <- as.data.frame(allts(allf))
write.table(fcdf, file=paste(current_month_name, "forecastSIS_aa_tdfp.csv", sep = "_"), sep=",", row.names=FALSE)

robjhyndman · February 3, 2022, 2:40am

Without a reproducible example, it will be hard to offer any help.

myles · February 3, 2022, 7:46pm

I have made edits for reproducibility.

myles · February 3, 2022, 9:02pm

Update 1:
I asked a colleague to run the code on their machine and it also produced NA's for AA.

I then asked another colleague to run the code on their machine and, strangely, it produced an AA forecast normally, without NA's for the higher level data.

myles · February 3, 2022, 9:03pm

This second colleague has an older version of R Studio, installed through Anaconda.

R version 3.6.1 (2019-07-05)

robjhyndman · February 3, 2022, 9:15pm

Many of the series consist entirely of zeros. So when the top-down proportions are being computed, there are 0/0 calculations leading to NaNs. Top-down proportions can't be used if you can't compute the proportional allocations throughout the hierarchy. If you use some other method, such as "wls", the results do not contain NaNs.

myles · February 3, 2022, 9:15pm

Why then does the AA method work alright on older versions of R?

myles · February 3, 2022, 9:23pm

Additionally, the wls method is in appropriate because what we're forecasting here is a marketing funnel. The wls method doesn't respond well to the fact that increases or decreases in higher level data usually lead to increases or decreases further down the funnel.

Prospects	Started	Submitted	Completed	Admitted	Deposited	Enrolled
2133	623	182	19	19	17	0
2583	929	368	19	18	17	0
3036	1289	781	19	18	17	0
3266	1500	1073	97	18	17	0
3411	1571	1138	788	18	17	0
3468	1602	1159	883	43	17	0
3537	1625	1173	993	874	26	0
3626	1660	1190	1025	934	32	0
3727	1686	1209	1052	964	84	3
3780	1707	1218	1067	983	164	34
3828	1732	1232	1073	995	231	155
3861	1753	1238	1084	1003	235	175
3897	1771	1243	1101	1018	242	191
3922	1783	1249	1110	1027	248	198
3939	1793	1250	1124	1040	255	211
3959	1808	1252	1127	1041	259	219
3973	1815	1255	1128	1043	262	225
3988	1824	1257	1133	1049	261	228
3995	1828	1258	1136	1051	261	228
4001	1834	1258	1130	1046	255	228
![image	464x421](upload://cQEWboighssgnRebd5PLyhR8To5.png)

robjhyndman · February 3, 2022, 9:34pm

I can't easily test old versions of the package or R, so I don't know how it was avoiding the 0/0 calculation.
You could get around it by changing the zeros to some small number (maybe add 1 to all values).

I don't understand your comment about wls being inappropriate. Any type of reconciliation should adjust weak lower level trends to match strong upper level trends. If the lower level trends are obvious in the data, they should be modelled by the base method in any case.

myles · February 3, 2022, 9:41pm

Perhaps I'm not understanding something regarding your suggestion to use wls. I get the following error when I switch to this code:

allf <- forecast(y, h = Forecast_horizon, method = "mls", FUN = function(x) auto.arima(x))

Error in match.arg(method) :
'arg' should be one of “comb”, “bu”, “mo”, “tdgsa”, “tdgsf”, “tdfp”

In any case, the problem is not that there is a weak lower level trend, the problem is that there is a weak higher level trend. The current prospects, starteds, and completeds are down by 30% compared to last year but this hasn't yet had an effect on the lowest level forecast yet when it should. AA tdfp is correctly forecasting a decrease in this lowest level of the marketing funnel, but not displaying the higher level data - only as NA's.

myles · February 3, 2022, 9:45pm

Confirmed that it appears to be an issue with the updated version of R.

Installed R 4.1.2, RStudio 2021.09.2, packages hts,forecast,readxl,readr. Files generated NA.
Installed R 3.6.1 & related packages/RTools. Files generated correctly.

myles · February 3, 2022, 9:52pm

Replacing all 0's with 1's appears to work fine as a temporary workaround:

robjhyndman · February 3, 2022, 9:58pm

Please read the help file. To use wls, set method="comb" and weights="wls". This is the default, so omitting both method and weights will give you WLS.

myles · February 3, 2022, 10:16pm

Thank you for the idea, but method ='comb' and weights = 'wls' still doesn't make the bottom of the marketing funnel responsive to increases or decreases in the top of the funnel.

Your idea of replacing the 0's with 1's appears to be a good workaround for now, although I don't know why tdfp would work fine with 0's in older versions of R but not the newer versions.

system · February 24, 2022, 10:16pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.