WHEN RIGID COHERENCY HURTS: DISTRIBUTIONAL COHERENCY REGULARIZATION FOR PROBABILISTIC HIERARCHICAL TIME SERIES FORECASTING

Abstract

Probabilistic hierarchical time-series forecasting is an important variant of timeseries forecasting, where the goal is to model and forecast multivariate time-series that have hierarchical relations. Previous works all assume rigid consistency over the given hierarchies and do not adapt to real-world data that show deviation from this assumption. Moreover, recent state-of-art neural probabilistic methods also impose hierarchical relations on point predictions and samples of predictive distribution. This does not account for full forecast distributions being coherent with the hierarchy and leads to poorly calibrated forecasts. We close both these gaps and propose PROFHIT, a probabilistic hierarchical forecasting model that jointly models forecast distributions over the entire hierarchy. PROFHIT (1) uses a flexible probabilistic Bayesian approach and (2) introduces soft distributional coherency regularization that enables end-to-end learning of the entire forecast distribution leveraging information from the underlying hierarchy. This enables robust and calibrated forecasts as well as adaptation to real-life data with varied hierarchical consistency. PROFHIT provides 41-88% better performance in accuracy and 23-33% better calibration over a wide range of dataset consistency. Furthermore, PROFHIT can robustly provide reliable forecasts even if up to 10% of input timeseries data is missing, whereas other methods' performance severely degrade by over 70%.

1. INTRODUCTION

Time-series forecasting is an important problem that impacts decision-making in a wide range of applications. In many real-world situations, the time-series have inherent hierarchical relations and structures. Examples include forecasting time-series of employment (Taieb et al., 2017) measured at different geographical scales; epidemic forecasting (Reich et al., 2019) at county, state and country, etc. Given time-series dataset with underlying hierarchical relations, the goal of Hierarchical Time-series Forecasting (HTSF) is to generate accurate forecast for all time-series leveraging the hierarchical relations between time-series (Hyndman et al., 2011) . Most previous methods do not provide well-calibrated forecasts for both so-called "strong" and "weakly" consistent datasets. Previous HTSF methods assume that the time-series values of datasets strictly satisfy the underlying hierarchical constraints and impose rigid coherency on generated forecasts i.e., forecasts strictly satisfy the hierarchical relations of dataset. These methods can model datasets generated (Taieb et al., 2017 ) by first collecting data for time-series of the leaf level nodes and deriving time-series for higher-level nodes. We call such data as strongly consistent. For example, classical HTSF methods (Hyndman & Athanasopoulos, 2018) use a bottom-up or top-down approach where all time-series at a single level of hierarchy are modeled independently and the values of other levels are derived using the aggregation function governing the hierarchy. In contrast, many real-world datasets are weakly consistent, i.e., they do not follow the strict constraints of the hierarchyfoot_0 . Such data have an underlying data generation process that may follow a hierarchical set of constraints but may contain some deviations. These deviations can be caused by factors such as measurement or reporting error, asynchrony in data aggregation and revision pipeline, etc, as frequently observed in



Note that we describe consistency over a dataset and coherency over model forecasts.1

