WHEN RIGID COHERENCY HURTS: DISTRIBUTIONAL COHERENCY REGULARIZATION FOR PROBABILISTIC HIERARCHICAL TIME SERIES FORECASTING

Abstract

Probabilistic hierarchical time-series forecasting is an important variant of timeseries forecasting, where the goal is to model and forecast multivariate time-series that have hierarchical relations. Previous works all assume rigid consistency over the given hierarchies and do not adapt to real-world data that show deviation from this assumption. Moreover, recent state-of-art neural probabilistic methods also impose hierarchical relations on point predictions and samples of predictive distribution. This does not account for full forecast distributions being coherent with the hierarchy and leads to poorly calibrated forecasts. We close both these gaps and propose PROFHIT, a probabilistic hierarchical forecasting model that jointly models forecast distributions over the entire hierarchy. PROFHIT (1) uses a flexible probabilistic Bayesian approach and (2) introduces soft distributional coherency regularization that enables end-to-end learning of the entire forecast distribution leveraging information from the underlying hierarchy. This enables robust and calibrated forecasts as well as adaptation to real-life data with varied hierarchical consistency. PROFHIT provides 41-88% better performance in accuracy and 23-33% better calibration over a wide range of dataset consistency. Furthermore, PROFHIT can robustly provide reliable forecasts even if up to 10% of input timeseries data is missing, whereas other methods' performance severely degrade by over 70%.

1. INTRODUCTION

Time-series forecasting is an important problem that impacts decision-making in a wide range of applications. In many real-world situations, the time-series have inherent hierarchical relations and structures. Examples include forecasting time-series of employment (Taieb et al., 2017) measured at different geographical scales; epidemic forecasting (Reich et al., 2019) at county, state and country, etc. Given time-series dataset with underlying hierarchical relations, the goal of Hierarchical Time-series Forecasting (HTSF) is to generate accurate forecast for all time-series leveraging the hierarchical relations between time-series (Hyndman et al., 2011) . Most previous methods do not provide well-calibrated forecasts for both so-called "strong" and "weakly" consistent datasets. Previous HTSF methods assume that the time-series values of datasets strictly satisfy the underlying hierarchical constraints and impose rigid coherency on generated forecasts i.e., forecasts strictly satisfy the hierarchical relations of dataset. These methods can model datasets generated (Taieb et al., 2017) by first collecting data for time-series of the leaf level nodes and deriving time-series for higher-level nodes. We call such data as strongly consistent. For example, classical HTSF methods (Hyndman & Athanasopoulos, 2018) use a bottom-up or top-down approach where all time-series at a single level of hierarchy are modeled independently and the values of other levels are derived using the aggregation function governing the hierarchy. In contrast, many real-world datasets are weakly consistent, i.e., they do not follow the strict constraints of the hierarchy 1 . Such data have an underlying data generation process that may follow a hierarchical set of constraints but may contain some deviations. These deviations can be caused by factors such as measurement or reporting error, asynchrony in data aggregation and revision pipeline, etc, as frequently observed in epidemic forecasting (Adhikari et al., 2019) . Most state-of-the-art HTSF methods are designed for strongly consistent datasets and impose rigid coherency constraints -they thus may not adapt to such deviations and can provide poor forecasts for weakly consistent datasets. (Han et al., 2021) . They do this and usually outperform post-processing methods by imposing hierarchical constraints on the mean or fixed quantiles of the forecast distributions. However, these methods do not enforce hierarchical coherency on the full distributions. Therefore, the forecasts may not be well-calibrated (Kuleshov et al., 2018) i.e., they produce unreliable prediction intervals that may not match observed probabilities from ground truth (Fisch et al., 2022) . 

Distributional Coherency Relations

× ✓ ✓ ✓ ✓ ✓ Strong & Weak Consistency × × × × ✓ ✓ Distributional Coherency × × ✓ × × ✓ End-to-end Learning × × × ✓ ✓ ✓ Robust to missing data × × × × × ✓ In this work, we fill this gap of learning well-calibrated and accurate forecasts for both strong and weakly consistent datasets leveraging underlying hierarchical relations. We propose PROFHIT (Probabilistic Robust Forecasting for Hierarchical Time-series), a neural probabilistic HTSF method that provides an end-to-end Bayesian approach to model the distributions of forecasts of all time-series together (see Table 1 for a comparison). Specifically, we introduce a novel Soft Distributional Coherency Regularization (SDCR) to tackle the challenge. First, SDCR enables PROFHIT to leverage hierarchical relations over entire forecast distributions to generate calibrated forecast distributions by encouraging forecast distribution of any parent node to be similar to aggregation of children nodes' forecast distribution (Figure 1 ). Second, since SDCR is a soft constraint, our model is trained to adapt to datasets with varying hierarchical consistency that allows the model to trade-off coherency for better accuracy and calibration on weakly consistent datasets. Our main contributions are: (1) Accurate and Calibrated Probabilistic Hierarchical Time-Series Forecasting: We propose PROFHIT, a deep probabilistic framework for modeling the distributions of each time-series together using a soft distributional coherency regularization (SDCR). PROFHIT leverages probabilistic deep-learning models to learn priors of individual time-series and refines the priors of all time-series leveraging the hierarchy to provide accurate and well-calibrated forecasts. (2) Adaptation to Strong and Weak Consistency via Soft Distributional Coherency Regularization: SDCR imposes soft hierarchical constraints on the full forecast distributions to help adapt the model to varying levels of hierarchical consistency. We build a novel refinement module over raw forecast priors and leverage multi-task learning over shared parameters that enable PROFHIT to perform consistently well across the hierarchy. (3) Evaluation Across Multiple Datasets and with Missing Data: We show that our method PROFHIT outperforms a wide variety of state-of-the-art baselines on both accuracy and calibration, at all levels of the hierarchy, for both strong and weakly consistent datasets. We also show training using SDCR enables PROFHIT to leverage hierarchical relations to provide robust predictions that can handle missing data values in the time-series.



Note that we describe consistency over a dataset and coherency over model forecasts.



Figure 1: Regularizing forecasts using Distributional Coherency. Moreover, previous methods do not focus on providing calibrated forecasts with precise uncertainty measures. Traditional methods focus on point predictions only. Recent methods like MINT (Wickramasuriya et al., 2019), ERM (Ben Taieb & Koo, 2019) and PEMBU (Taieb et al., 2017) refine raw independent forecast distribution as a post-processing step. This does not enable the models generating the raw forecasts to leverage underlying hierarchical relations across time-series. End-to-end learning neural methods directly leverage hierarchical relations as part of the model architecture or learning algorithm like HIERE2E (Rangapuram et al., 2021) and SHARQ(Han et al., 2021). They do this and usually outperform post-processing methods by imposing hierarchical constraints on the mean or fixed quantiles of the forecast distributions. However, these methods do not enforce hierarchical coherency on the full distributions. Therefore, the forecasts may not be well-calibrated(Kuleshov et al., 2018)  i.e., they produce unreliable prediction intervals that may not match observed probabilities from ground truth(Fisch et al., 2022).

Comparison of PROFHIT with state-of-the-art methods.

