VARIATIONAL DYNAMIC MIXTURES

Abstract

Deep probabilistic time series forecasting models have become an integral part of machine learning. While several powerful generative models have been proposed, we provide evidence that their associated inference models are oftentimes too limited and cause the generative model to predict mode-averaged dynamics. Modeaveraging is problematic since many real-world sequences are highly multi-modal, and their averaged dynamics are unphysical (e.g., predicted taxi trajectories might run through buildings on the street map). To better capture multi-modality, we develop variational dynamic mixtures (VDM): a new variational family to infer sequential latent variables. The VDM approximate posterior at each time step is a mixture density network, whose parameters come from propagating multiple samples through a recurrent architecture. This results in an expressive multi-modal posterior approximation. In an empirical study, we show that VDM outperforms competing approaches on highly multi-modal datasets from different domains.

1. INTRODUCTION

Making sense of time series data is an important challenge in various domains, including ML for climate change. One important milestone to reach the climate goals is to significantly reduce the CO 2 emissions from mobility (Rogelj et al., 2016) . Accurate forecasting models of typical driving behavior and of typical pollution levels over time can help both lawmakers and automotive engineers to develop solutions for cleaner mobility. In these applications, no accurate physical model of the entire dynamic system is known or available. Instead, data-driven models, specifically deep probabilistic time series models, can be used to solve the necessary tasks including forecasting. The dynamics in such data can be highly multi-modal. At any given part of the observed sequence, there might be multiple distinct continuations of the data that are plausible, but the average of these behaviors is unlikely, or even physically impossible. Consider for example a dataset of taxi trajectoriesfoot_0 . In each row of Fig. 1a , we have selected 50 routes from the dataset with similar starting behavior (blue). Even though these routes are quite similar to each other in the first 10 way points, the continuations of the trajectories (red) can exhibit quite distinct behaviors and lead to points on any far edge of the map. The trajectories follow a few main traffic arteries, these could be considered the main modes of the data distribution. We would like to learn a generative model of the data, that based on some initial way points, can forecast plausible continuations for the trajectories. Many existing methods make restricting modeling assumptions such as Gaussianity to make learning tractable and efficient. But trying to capture the dynamics through unimodal distributions can lead either to "over-generalization", (i.e. putting probability mass in spurious regions) or on focusing only on the dominant mode and thereby neglecting important structure of the data. Even neural approaches, with very flexible generative models can fail to fully capture this multi-modality because their capacity is often limited through the assumptions of their inference model. To address this, we develop variational dynamic mixtures (VDM). Its generative process is a sequential latent variable model. The main novelty is a new multi-modal variational family which makes learning and inference multi-modal yet tractable. In summary, our contributions are • A new inference model. We establish a new type of variational family for variational inference of sequential latent variables. By successively marginalizing over previous latent states, the procedure can be efficiently carried-out in a single forward pass and induces a multi-modal posterior



https://www.kaggle.com/crailtap/taxi-trajectory 1

