PROJECTED LATENT MARKOV CHAIN MONTE CARLO: CONDITIONAL SAMPLING OF NORMALIZING FLOWS

Abstract

We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the exact conditional distributions learned by normalizing flows. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. Through experimental tests applying normalizing flows to missing data tasks for a variety of data sets, we demonstrate the efficacy of PL-MCMC for conditional sampling from normalizing flows.

1. INTRODUCTION

Conditional sampling from modeled joint probability distributions offers a statistical framework for approaching tasks involving missing and incomplete data. Deep generative models have demonstrated an exceptional capability for approximating the distributions governing complex data. Brief analysis illustrates a fundamental guarantee for generative models: the inaccuracy (i.e. divergence from ground truth) of a generative model's approximated joint distribution upper bounds the expected inaccuracies of the conditional distributions known by the model, as shown in Appendix A. Although this guarantee holds for all generative models, specialized variants are typically used to approach tasks involving the conditional distributions among modeled variables, due to the difficulty in accessing the conditional distributions known by unspecialized generative models. Quite often, otherwise well trained generative models possess a capability for conditional inference that is regrettably locked away from our access. Normalizing flow architectures like RealNVP (Dinh et al., 2014) and GLOW (Kingma & Dhariwal, 2018) have demonstrated accurate and expressive generative performance, showing great promise for application to missing data tasks. Additionally, by enabling the calculation of exact likelihoods, normalizing flows offer convenient mathematical properties for approaching exact conditional sampling. We are therefore motivated to develop techniques for sampling from the exact conditional distributions known by normalizing flows. In this paper, we propose Projected Latent Markov Chain Monte Carlo (PL-MCMC), a conditional sampling technique that takes advantage of the convenient mathematical structure of normalizing flows by defining a Markov Chain within a flow's latent space and accepting proposed transitions based on the likelihood of the resulting imputation. In principle, PL-MCMC enables exact conditional sampling without requiring specialized architecture, training history, or external inference machinery. Our Contributions: We prove that a Metropolis-Hastings implementation of our proposed PL-MCMC technique is asymptotically guaranteed to sample from the exact conditional distributions known by any normalizing flow satisfying very mild positivity and smoothness requirements. We then describe how to use PL-MCMC to perform Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete training data. To illustrate and demonstrate aspects of the technique, we perform a series of experiments utilizing PL-MCMC to complete CIFAR-10 images, CelebA images, and MNIST digits affected by missing data. Finally, we perform a series of experiments training non-specialized normalizing flows to model MNIST digits and continuous UCI datasets from incomplete training data to verify the performance of the proposed method. Through these experimental results, we find that PL-MCMC holds great practical promise for tasks requiring conditional sampling from normalizing flows.

2. RELATED WORK

A conditional variant of normalizing flows has been introduced by Lu & Huang (2020) to model a single conditional distribution between architecturally fixed sets of conditioned and conditioning variables. While quite capable of learning individual conditional distributions, conditional variants do not enable arbitrary conditional sampling from a joint model. Richardson et al. (2020) concurrently train a deterministic inference network alongside a normalizing flow for inferring missing data. Although such an inference network can produce deterministic imputations consistent with the distributions learned by a normalizing flow, it cannot stochastically sample from the conditional distributions known by the flow. Li et al. ( 2019) introduce shared parameter approximations that allow the derivation of approximate conditional normalizing flows, though these approximations do not guarantee exact sampling from the conditional distributions of a particular joint model. Similar techniques for approaching missing data with other generative models, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have been introduced with similar limitations (Ivanov et al., 2018; Yoon et al., 2018; Li et al., 2018) . A MCMC procedure for sampling from the conditional distributions of VAEs has been introduced by Rezende et al. ( 2014) and refined by Mattei & Frellsen (2018) . This procedure fundamentally relies on the many-to-many relationship between the latent and modeled data spaces of VAEs, and cannot be directly applied to normalizing flows, wherein the latent state uniquely determines (and is uniquely determined by) the modeled data state. By following an unconstrained Markov Chain within the latent space, PL-MCMC mirrors this VAE conditional sampling procedure within the context of normalizing flows. PL-MCMC leverages the probabilistic structure learned by a normalizing flow to produce efficient Markov Chains. The utility of the mathematical structure of normalizing flows for approaching Monte Carlo estimation via independence sampling has been demonstrated by Müller et al. (2019) . The probabilistic structure of normalizing flows has also been shown to improve unconditional sampling from externally defined distributions by Hoffman et al. (2019) . In using this learned structure, we believe that PL-MCMC receives many of the benefits of Adaptive Monte Carlo methods (Haario et al., 2001; Foreman-Mackey et al., 2013; Zhu, 2019) , as explained in Appendix B. PL-MCMC's unconstrained Markov Chain through the latent space is not the only conceivable option for sampling from the conditional distributions described by normalizing flows. As normalizing flows enable exact joint likelihood calculations, we could employ MCMC methods through the modeled data space. Dinh et al. ( 2014) demonstrate a stochastic conditional MAP inference that can be adapted to implement the unadjusted Langevin algorithm (Fredrickson et al., 2006; Durmus et al., 2019) or the Metropolis adjusted Langevin algorithm (Grenander & Miller, 1994) . A constrained Hamiltonian Monte Carlo approach has also been introduced in the context of conditional sampling from generative models by Graham et al. (2017) . MCMC methods restricted to the modeled data space approach the normalizing flow as a sort of blackbox oracle to be used only for calculations regarding data likelihood. By design, PL-MCMC leverages the flow's one-to-one mapping between latent and modeled data spaces, thereby taking better advantage of the probabilistic structure learned by our normalizing flows to perform conditional sampling.

3. THE PL-MCMC APPROACH

We consider a normalizing flow between latent space Ξ and modeled data space X , defining the mappings f θ : Ξ → X and f -1 θ : X → Ξ. This normalizing flow imposes the probability density p f,θ (x) onto all modeled data values x ∈ X . By the pairing (x M ; x O ), we denote the missing and observed portion of a modeled data value with joint density p f,θ (x M ; x O ) under our normalizing flow. Our goal is to sample from the conditional density described by the normalizing flow, p f,θ (x M |x O ).

3.1. THE PROJECTED LATENT TARGET DISTRIBUTION

Rather than targeting the conditional distribution of missing values directly, PL-MCMC targets a distribution of latent variables that, after mapping through the flow's transformation, marginalizes to the desired conditional distribution. Let the Markov Chain be composed of latent state ξ ∈ Ξ,

