SEMI-IMPLICIT VARIATIONAL INFERENCE VIA SCORE MATCHING

Abstract

Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (EL-BOs) or employ expensive inner-loop MCMC runs for direct ELBO maximization for training. In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. Leveraging the hierarchical structure of semi-implicit variational families, the score matching objective allows a minimax formulation where the intractable variational densities can be naturally handled with denoising score matching. We show that SIVI-SM closely matches the accuracy of MCMC and outperforms ELBO-based SIVI methods in a variety of Bayesian inference tasks.

1. INTRODUCTION

Variational inference(VI) is an approximate Bayesian inference approach where the inference problem is transformed into an optimization problem (Jordan et al., 1999; Wainwright & Jordan, 2008; Blei et al., 2017) . It starts by introducing a family of variational distributions over the model parameters (or latent variables) to approximate the posterior. The goal then is to find the closest member from this family of distributions to the target posterior, where the closeness is usually measured by the Kullback-Leibler (KL) divergence from the posterior to the variational approximation. In practice, this is often achieved by maximizing the evidence lower bound (ELBO), which is equivalent to minimizing the KL divergence (Jordan et al., 1999) . One of the classical VI methods is mean-field VI (Bishop & Tipping, 2000) , where the variational distributions are assumed to be factorized over the parameters (or latent variables). When combined with conditional conjugacy, this often leads to simple optimization schemes with closed-form update rules (Blei et al., 2017) . While popular, the factorizable assumption and conjugacy condition greatly restrict the flexibility and applicability of variational posteriors, especially for complicated models with high dimensional parameter space. Recent years have witnessed much progress in the field of VI that extends it to more complicated settings. For example, the conjugacy condition has been removed by the black-box VI methods which allow a broad class of models via Monte carlo gradient estimators (Nott et al., 2012; Paisley et al., 2012; Ranganath et al., 2014; Rezende et al., 2014; Kingma & Welling, 2014) . On the other hand, more flexible variational families have been proposed that either explicitly incorporate more complicated structures among the parameters (Jaakkola & Jordan, 1998; Saul & Jordan, 1996; Giordano et al., 2015; Tran et al., 2015) or borrow ideas from invertible transformation of probability distributions (Rezende & Mohamed, 2015; Dinh et al., 2017; Kingma et al., 2016; Papamakarios et al., 2019) . All these methods require tractable densities for the variational distributions. It turns out that the variational family can be further expanded by allowing implicit models that have intractable densities but are easy to sample from (Huszár, 2017) . One way to construct these implicit models is to transform a simple base distribution via a deterministic map, i.e., a deep neural

