SAMPLING WITH MOLLIFIED INTERACTION ENERGY DESCENT

Abstract

Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this paper, we present a new optimization-based method for sampling called mollified interaction energy descent (MIED). MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs). These energies rely on mollifier functions-smooth approximations of the Dirac delta originated from PDE theory. We show that as the mollifier approaches the Dirac delta, the MIE converges to the chi-square divergence with respect to the target measure and the minimizers of MIE converge to the target measure. Optimizing this energy with proper discretization yields a practical firstorder particle-based algorithm for sampling in both unconstrained and constrained domains. We show experimentally that for unconstrained sampling problems, our algorithm performs on par with existing particle-based algorithms like SVGD, while for constrained sampling problems our method readily incorporates constrained optimization techniques to handle more flexible constraints with strong performance compared to alternatives.

1. INTRODUCTION

Sampling from an unnormalized probability density is a ubiquitous task in statistics, mathematical physics, and machine learning. While Markov chain Monte Carlo (MCMC) methods (Brooks et al., 2011) provide a way to obtain unbiased samples at the price of potentially long mixing times, variational inference (VI) methods (Blei et al., 2017) approximate the target measure with simpler (e.g., parametric) distributions at a lower computational cost. In this work, we focus on a particular class of VI methods that approximate the target measure using a collection of interacting particles. A primary example is Stein variational gradient descent (SVGD) proposed by Liu & Wang (2016), which iteratively applies deterministic updates to a set of particles to decrease the KL divergence to the target distribution. While MCMC and VI methods have found great success in sampling from unconstrained distributions, they often break down for distributions supported in a constrained domain. Constrained sampling is needed when the target density is undefined outside a given domain (e.g., the Dirichlet distribution), when the target density is not integrable in the entire Euclidean space (e.g., the uniform distribution), or when we only want samples that satisfy certain inequalities (e.g., fairness constraints in Bayesian inference (Liu et al., 2021) ). A few recent approaches (Brubaker et al., 2012; Byrne & Girolami, 2013; Liu & Zhu, 2018; Shi et al., 2021) extend classical sampling methods like Hamiltonian Monte Carlo (HMC) or SVGD to constrained domains. These extensions, however, typically contain expensive numerical subroutines like solving nonlinear systems of equations and require explicit formulas for quantities such as Riemannian metric tensors or mirror maps to be derived on a case-by-case basis from the constraints.

