ANY-SCALE BALANCED SAMPLERS FOR DISCRETE SPACES

Abstract

The locally balanced informed proposal has proved to be highly effective for sampling from discrete spaces. However, its success relies on the "local" factor, which ensures that whenever the proposal distribution is restricted to be near the current state, the locally balanced weight functions are asymptotically optimal and the gradient approximations are accurate. In seeking a more efficient sampling algorithm, many recent works have considered increasing the scale of the proposal distributions, but this causes the "local" factor to no longer hold. Instead, we propose any-scale balanced samplers to repair the gap in non-local proposals. In particular, we substitute the locally balanced function with an any-scale balanced function that can self-adjust to achieve better efficiency for proposal distributions at any scale. We also use quadratic approximations to capture curvature of the target distribution and reduce the error in the gradient approximation, while employing a Gaussian integral trick with a special estimated diagonal to efficiently sample from the quadratic proposal distribution. On various synthetic and real distributions, the proposed sampler substantially outperforms existing approaches.

1. INTRODUCTION

The Markov Chain Monte Carlo (MCMC) algorithm is one of the most widely used methods for sampling from intractable distributions (Robert et al., 1999) . Gradient-based samplers that leverage gradient information to guide the proposal have achieved significant advances in sampling from continuous spaces, demonstrated, for example, by the Metropolis Adjusted Langevin Algorithm (MALA) (Rossky et al., 1978) , Hamiltonian Monte Carlo (HMC) (Duane et al., 1987) , and related variants (Girolami & Calderhead, 2011; Hoffman et al., 2014) . However, for discrete spaces, gradient based samplers remain far less well understood. Recently, a family of locally balanced (LB) samplers (Zanella, 2020; Grathwohl et al., 2021; Sun et al., 2021; 2022a; Zhang et al., 2022) have demonstrated promise in sampling from discrete spaces. Such samplers use a locally balanced weight function in an informed proposal Q(x, y) ∝ g(π(y)/π(x))K σ (x -y), such that g : R → R is a weight function that satisfies g(t) = tg( 1 t ), π is the target distribution, and K σ is a kernel that determines the scale of the proposal distribution. It is also shown that such a locally balanced informed proposal is a discrete version of MALA, since they both simulate gradient flows in the Wasserstein manifold (Sun et al., 2022a) . In initial work, Zanella (2020) considered a local proposal with a kernel K σ that restricts next states to lie within a 1-Hamming ball, seeking to capture natural discrete topological structure arising, for example, in spaces of trees, partitions or permutations. For more regular discrete spaces, such as lattices, Grathwohl et al. ( 2021) introduce a gradient approximation for the probability ratio π(y)/π(x) ≈ exp(⟨y -x, ∇ log π(x)⟩ to make the locally balanced proposal more scalable.

