COMPRESSED PREDICTIVE INFORMATION CODING

Abstract

Unsupervised learning plays an important role in many fields, such as machine learning, data compression, and neuroscience. Compared to static data, methods for extracting low-dimensional structure for dynamic data are lagging. We developed a novel information-theoretic framework, Compressed Predictive Information Coding (CPIC), to extract predictive latent representations from dynamic data. Predictive information quantifies the ability to predict the future of a time series from its past. CPIC selectively projects the past (input) into a low dimensional space that is predictive about the compressed data projected from the future (output). The key insight of our framework is to learn representations by balancing the minimization of compression complexity with maximization of the predictive information in the latent space. We derive tractable variational bounds of the CPIC loss by leveraging bounds on mutual information. The CPIC loss induces the latent space to capture information that is maximally predictive of the future of the data from the past. We demonstrate that introducing stochasticity in the encoder and maximizing the predictive information in latent space contributes to learning more robust latent representations. Furthermore, our variational approaches perform better in mutual information estimation compared with estimates under the Gaussian assumption commonly used. We show numerically in synthetic data that CPIC can recover dynamical systems embedded in noisy observation data with low signal-to-noise ratio. Finally, we demonstrate that CPIC extracts features more predictive of forecasting exogenous variables as well as auto-forecasting in various real datasets compared with other state-of-the-art representation learning models. Together, these results indicate that CPIC will be broadly useful for extracting low-dimensional dynamic structure from high-dimensional, noisy timeseries data.

1. INTRODUCTION

Unsupervised methods play an important role in learning representations that provide insight into data and exploit unlabeled data to improve performance in downstream tasks in diverse application areas Bengio et al. ( 2013 (Bai et al., 2020) . Generative models focus on capturing the joint distribution between representations and inputs, but are usually computationally expensive. On the other hand, discriminative models emphasize capturing the dependence of data structure in the low-dimensional latent space, and are therefore easier to scale to large datasets. In the case of time series, some representation learning models take advantage of an estimate of mutual information between encoded past (input) and the future (output) (Creutzig & Sprekeler, 2008; Creutzig et al., 2009; Oord et al., 2018) . Although previous models utilizing mutual information extract low-dimensional representations, they tend to be sensitive to noise in the observational space. DCA directly makes use of the mutual information between the past and the future (i.e., the predictive information (Bialek et al., 2001) ) in a latent representational space that is a linear embedding of the observation data. However, DCA operates under Gaussian assumptions for mutual information



); Chen et al. (2020); Grill et al. (2020); Devlin et al. (2018); Brown et al. (2020); Baevski et al. (2020); Wang et al. (2020). Prior work on unsupervised representation learning can be broadly categorized into generative models such as variational autoencoders(VAEs) (Kingma & Welling, 2013) and generative adversarial networks (GAN) (Goodfellow et al., 2014), discriminative models such as dynamical components analysis (DCA) (Clark et al., 2019), contrastive predictive coding (CPC) (Oord et al., 2018), and deep autoencoding predictive components (DAPC)

