MULTIVARIATE PROBABILISTIC TIME SERIES FORE-CASTING VIA CONDITIONED NORMALIZING FLOWS

Abstract

Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited for this problem, but multivariate models often assume a simple parametric distribution and do not scale to high dimensions. In this work we model the multivariate temporal dynamics of time series via an autoregressive deep learning model, where the data distribution is represented by a conditioned normalizing flow. This combination retains the power of autoregressive models, such as good performance in extrapolation into the future, with the flexibility of flows as a general purpose high-dimensional distribution model, while remaining computationally tractable. We show that it improves over the state-of-the-art for standard metrics on many real-world data sets with several thousand interacting time-series.

1. INTRODUCTION

Classical time series forecasting methods such as those in Hyndman & Athanasopoulos (2018) typically provide univariate forecasts and require hand-tuned features to model seasonality and other parameters. Time series models based on recurrent neural networks (RNN), like LSTM (Hochreiter & Schmidhuber, 1997) , have become popular methods due to their end-to-end training, the ease of incorporating exogenous covariates, and their automatic feature extraction abilities, which are the hallmarks of deep learning. Forecasting outputs can either be points or probability distributions, in which case the forecasts typically come with uncertainty bounds. The problem of modeling uncertainties in time series forecasting is of vital importance for assessing how much to trust the predictions for downstream tasks, such as anomaly detection or (business) decision making. Without probabilistic modeling, the importance of the forecast in regions of low noise (small variance around a mean value) versus a scenario with high noise cannot be distinguished. Hence, point estimation models ignore risk stemming from this noise, which would be of particular importance in some contexts such as making (business) decisions. Finally, individual time series, in many cases, are statistically dependent on each other, and models need the capacity to adapt to this in order to improve forecast accuracy (Tsay, 2014) . For example, to model the demand for a retail article, it is important to not only model its sales dependent on its own past sales, but also to take into account the effect of interacting articles, which can lead to cannibalization effects in the case of article competition. As another example, consider traffic flow in a network of streets as measured by occupancy sensors. A disruption on one particular street will also ripple to occupancy sensors of nearby streets-a univariate model would arguably not be able to account for these effects. In this work, we propose end-to-end trainable autoregressive deep learning architectures for probabilistic forecasting that explicitly models multivariate time series and their temporal dynamics by employing a normalizing flow, like the Masked Autoregressive Flow (Papamakarios et al., 2017) or

