APPROXIMATE PROBABILISTIC INFERENCE WITH COMPOSED FLOWS

Abstract

We study the problem of probabilistic inference on the joint distribution defined by a normalizing flow model. Given a pre-trained flow model p(x), we wish to estimate p(x 2 | x 1 ) for some arbitrary partitioning of the variables x = (x 1 , x 2 ). We first show that this task is computationally hard for a large class of flow models. Motivated by this hardness result, we propose a framework for approximate probabilistic inference. Specifically, our method trains a new generative model with the property that its composition with the given model approximates the target conditional distribution. By parametrizing this new distribution as another flow model, we can efficiently train it using variational inference and also handle conditioning under arbitrary differentiable transformations. Since the resulting approximate posterior remains a flow, it offers exact likelihood evaluation, inversion, and efficient sampling. We provide an extensive empirical evidence showcasing the flexibility of our method on a variety of inference tasks with applications to inverse problems. We also experimentally demonstrate that our approach is comparable to simple MCMC baselines in terms of sample quality. Further, we explain the failure of naively applying variational inference and show that our method does not suffer from the same issue.

1. INTRODUCTION

Generative modeling has seen an unprecedented growth in the recent years. Building on the success of deep learning, deep generative models have shown impressive ability to model complex distributions in a variety of domains and modalities. Among them, normalizing flow models (see Papamakarios et al. (2019) and references therein) stand out due to their computational flexibility, as they offer efficient sampling, likelihood evaluation, and inversion. While other types of models currently outperform flow models in terms of likelihood and sample quality, flow models have the advantage that they are relatively easy to train using maximum likelihood and do not suffer from issues that other models possess (e.g. mode collapse for GANs, posterior collapse for VAEs, slow sampling for autoregressive models). These characteristics make normalizing flows attractive for a variety of downstream tasks, including density estimation, inverse problems, semi-supervised learning, reinforcement learning, and audio synthesis (Ho et al., 2019; Asim et al., 2019; Atanov et al., 2019; Ward et al., 2019; Oord et al., 2018) . Even with such computational flexibility, how to perform efficient probabilistic inference on a flow model still remains largely unknown. This question is becoming increasingly important as generative models increase in size and the computational resources necessary to train them from scratch are out of reach for many researchers and practitioners 1 . If it was possible to perform probabilistic inference on flow models, we could re-purpose these powerful pre-trained generators for numerous custom tasks. This is the central question we study in this paper: One wishes to estimate the conditional distribution p(x 2 | x 1 ) from a given flow model p(x) for some partitioning of variables x = (x 1 , x 2 ). Existing methods for this task largely fall under two categories: Markov Chain Monte Carlo (MCMC) and variational inference (VI). While MCMC methods can perform exact conditional sampling in theory, they often have prohibitively long mixing time for complex high-dimensional distributions and also do not provide likelihoods. On the other hand, VI allows for approximate likelihood evaluation under the variational posterior and fast sampling, but at a lower sample quality compared to MCMC counterparts. We propose a novel method that leverages a powerful pre-trained flow model by constructing carefully designed latent codes to generate conditional samples via variational inference. While this procedure is intractable for latent variable models in general, the invertibility of the pre-trained model gives us a tractable algorithm for learning a distribution in the latent space whose samples approximately match the true conditional when fed into the pre-trained model.

Our contributions:

• We start with an interesting theoretical hardness result. We show that even though flow models are designed to provide efficient inversion and sampling, even approximate sampling from the exact conditional distribution is provably computationally intractable for a wide class of flow models. This motivates our approach of smoothing the observation. • We develop a method to estimate the target conditional distribution by composing a second flow model (which we call the pre-generator) with the given model. In particular, this parametrization allows us to employ variational inference and avoid unstable adversarial training as was explored in existing work. • The resulting approximate posterior retains the computational flexibility of a flow model and can be used on a wide variety of downstream tasks that require fast sampling, exact likelihood evaluation, or inversion. Compared to MCMC methods, it has the benefit of being able to generate samples that are guaranteed to be i.i.d. • We experimentally show that our approach is comparable to simple MCMC baselines in terms of sample quality metrics such as Frechet Inception Distance (Heusel et al., 2017) . We also demonstrate that it achieves superior conditional likelihood estimation performance compared to regular variational inference. • We extend and validate our method for conditioning under arbitrary differentiable transformations with applications to inverse problems. We qualitatively demonstrate its flexibility on various complex inference tasks.

2. BACKGROUND

2.1 NORMALIZING FLOWS Normalizing flow models (also known as invertible generative models) represent complex probability distributions by transforming a simple input noise z (typically standard Gaussian) through a differentiable bijection f : R d → R d . Since f is invertible, change of variables formula allows us to compute the probability density of x = f (z): log p(x) = log p(z) + log det df -1 dx (x) , where df -1 dx denotes the Jacobian of the inverse transformation f -1 : x → z. Flow models are explicitly designed so that the above expression can be easily computed, including the log-determinant term. This tractability allows them to be directly trained with maximum likelihood objective on data. (2015) , which served as a basis for many other subsequently proposed models mentioned above. In an additive coupling layer, the input variable is partitioned as x = (x 1 , x 2 ) ∈ R d1 × R d2 . The layer is parametrized by a neural network g(x 1 ) : R d1 → R d2 used to additively transform x 2 . Thus the layer's output y = (y 1 , y 2 ) ∈ R d1 × R d2 and its inverse can be computed as follows: y 1 = x 1 y 2 = x 2 + g(x 1 ) ⇐⇒ x 1 = y 1 x 2 = y 2g(y 1 )



For example, Kingma & Dhariwal (2018) report that their largest model had 200M parameters and was trained on 40 GPUs for a week.



Starting from the early works ofDinh et al. (2015)  andRezende & Mohamed (2015), there has been extensive research on invertible architectures for generative modeling. Many of them work by composing a series of invertible layers, such as in RealNVP(Dinh et al., 2016),IAF (Kingma et al.,  2016), Glow (Kingma & Dhariwal, 2018), invertible ResNet (Behrmann et al., 2019), and Neural  Spline Flows (Durkan et al., 2019).One of the simplest invertible layer construction is additive coupling layer introduced by Dinh et al.

