PARTIAL REJECTION CONTROL FOR ROBUST VARIA-TIONAL INFERENCE IN SEQUENTIAL LATENT VARI-ABLE MODELS

Abstract

Effective variational inference crucially depends on a flexible variational family of distributions. Recent work has explored sequential Monte-Carlo (SMC) methods to construct variational distributions, which can, in principle, approximate the target posterior arbitrarily well, which is especially appealing for models with inherent sequential structure. However, SMC, which represents the posterior using a weighted set of particles, often suffers from particle weight degeneracy, leading to a large variance of the resulting estimators. To address this issue, we present a novel approach that leverages the idea of partial rejection control (PRC) for developing a robust variational inference (VI) framework. In addition to developing a superior VI bound, we propose a novel marginal likelihood estimator constructed via a dice-enterprise: a generalization of the Bernoulli factory to construct unbiased estimators for SMC-PRC. The resulting variational lower bound can be optimized efficiently with respect to the variational parameters and generalizes several existing approaches in the VI literature into a single framework. We show theoretical properties of the lower bound and report experiments on various sequential models, such as the Gaussian state-space model and variational RNN, on which our approach outperforms existing methods.

1. INTRODUCTION

Exact inference in latent variable models is usually intractable. Markov Chain Monte-Carlo (MCMC) (Andrieu et al., 2003) and variational inference (VI) methods (Blei et al., 2017) , are commonly employed in such models to make inference tractable. While MCMC has been the traditional method of choice, often with provable guarantees, optimization-based VI methods have also enjoyed considerable recent interest due to their excellent scalability on large-scale datasets. VI is based on maximizing a lower bound constructed through a marginal likelihood estimator. For latent variable models with sequential structure, sequential Monte-Carlo (SMC) (Doucet & Johansen, 2009) returns a much lower variance estimator of the log marginal likelihood than importance sampling (Bérard et al., 2014; Cérou et al., 2011) . In this work, we focus our attention on designing a low variance, unbiased, and computationally efficient estimator of the marginal likelihood. The performance of SMC based methods is strongly dependent on the choice of the proposal distribution. Inadequate proposal distributions propose values in low probability areas under the target, leading to particle depletion (Doucet & Johansen, 2009) . An effective solution is to use rejection control (Liu et al., 1998; Peters et al., 2012) which is based on an approximate rejection sampling step within SMC to reject samples with low importance weights. In this work, we leverage the idea of partial rejection control (PRC) within the framework of SMC based VI for sequential latent variable models. To this end, we construct a novel lower bound, VSMC-PRC, and propose an efficient optimization strategy for selecting the variational parameters. Compared to other recent SMC based VI approaches (Naesseth et al., 2017; Maddison et al., 2017; Le et al., 2017) , our approach consists of an inbuilt accept-reject mechanism within SMC to prevent particle depletion. The use of accept-reject within SMC makes the particle weight intractable, therefore, we use a generalization of the Bernoulli factory (Asmussen et al., 1992) to construct unbiased estimators of the marginal likelihood for SMC-PRC. Although the idea of combining VI with an inbuilt accept-reject mechanism is not new (Salimans et al., 2015; Ruiz & Titsias, 2019; Grover et al., 2018; Gummadi, 2014) , a key distinction of our approach is to incorporate an accept-reject mechanism along with a resampling framework. In contrast to standard sampling algorithms that may reject the entire stream of particles, we use a partial accept-reject on the most recent update, increasing the sampling efficiency. Further, the variational framework of SMC-PRC is interesting in itself as it combines accept-reject with particle filter methods. Therefore, our proposed bound VSMC-PRC generalizes several existing approaches for example: Variational Rejection Sampling (VRS) (Grover et al., 2018) , FIVO (Maddison et al., 2014) , IWAE (Burda et al., 2015) , and standard variational Bayes (Blei et al., 2017) . Another key distinction is that, while existing approaches using Bernoulli factory are limited to niche one-dimensional toy examples, our proposed approach is scalable. To the best of our knowledge, there is no prior work that has used Bernoulli factories for such a general case like variational recurrent neural networks (VRNN); therefore, we believe this aspect to be a significant contribution as well. The rest of the paper is organized as follows: In Section 2, we provide a brief review on SMC, partial rejection control, and dice enterprise. In Section 3, we introduce our VSMC-PRC bound and provide new theoretical insights into the Monte-Carlo estimator and design efficient ways to optimize it. Finally, we discuss related work and present experiments on the Gaussian state-space model (SSM) and VRNN.

2. BACKGROUND

We denote a sequence of T real-valued observations as x 1:T = (x 1 , x 2 , . . . , x T ), and assume that there is an associated sequence of latent variables z 1:T = (z 1 , z 2 , . . . , z T ). We are interested in inferring the posterior distribution of the latent variables, i.e., p(z 1:T |x 1:T ). The task is, in general, intractable. For the rest of the paper we have used some common notations from SMC and VI literature where z i t : i th particle at time t; A i t-1 : ancestor variable for the i th particle at time t; θ and φ: model and variational parameters, respectively.

2.1. SEQUENTIAL MONTE CARLO WITH PARTIAL REJECTION CONTROL

An SMC sampler approximates a sequence of densities {p θ (z 1:t |x 1:t )} T t=1 through a set of N weighted samples generated from a proposal distribution. Let the proposal density be q φ (z 1:T |x 1:T ) = T t=1 q φ (z t |x 1:t , z 1:t-1 ). (1) Consider time t-1 at which we have uniformly weighted samples {N -1 , z i 1:t-1 , A i t-1 } N i=1 estimating p θ (z 1:t-1 |x 1:t-1 ). We want to estimate p θ (z 1:t |x 1:t ) such that particles with a low importance weight are automatically rejected. PRC achieves this by using an approximate rejection sampling step (Liu et al., 1998; Peters et al., 2012) . The overall procedure is as follows: 1. Generate z i t ∼ q φ (z t |x 1:t , z A i t-1 1:t-1 ) where i = 1, 2, . . . , N . 2. Accept z i t with probability a θ,φ (z i t |z A i t-1 1:t-1 , x 1:t ) =   1 + M (i, t -1)q φ (z i t |x 1:t , z A i t-1 1:t-1 ) p θ (x t , z i t |x 1:t-1 , z A i t-1 1:t-1 )   -1 , where M (i, t -1) is a hyperparameter controlling the acceptance rate (see Proposition 3 and Section 3.3 for more details). Note that PRC applies accept-reject only on z i t , not on the entire trajectory. 3. If z i t is rejected go to step 1. 4. The new incremental importance weight of the accepted sample is α t (z i 1:t ) = c i t Z(z A i t-1 1:t-1 , x 1:t ), where c i t is c i t = p θ (x t , z i t |x 1:t-1 , z A i t-1 1:t-1 ) q φ (z i t |x 1:t , z A i t-1 1:t-1 )a θ,φ (z i t |z A i t-1 1:t-1 , x 1:t ) ,

