ACCELERATING HAMILTONIAN MONTE CARLO VIA CHEBYSHEV INTEGRATION TIME

Abstract

Hamiltonian Monte Carlo (HMC) is a popular method in sampling. While there are quite a few works of studying this method on various aspects, an interesting question is how to choose its integration time to achieve acceleration. In this work, we consider accelerating the process of sampling from a distribution π(x) ∝ exp(-f (x)) via HMC via time-varying integration time. When the potential f is L-smooth and m-strongly convex, i.e. for sampling from a log-smooth and strongly log-concave target distribution π, it is known that under a constant integration time, the number of iterations that ideal HMC takes to get an Wasserstein-2 distance to the target π is O(κ log 1 ), where κ := L m is the condition number. We propose a scheme of time-varying integration time based on the roots of Chebyshev polynomials. We show that in the case of quadratic potential f , i.e. when the target π is a Gaussian distribution, ideal HMC with this choice of integration time only takes O( √ κ log 1 ) number of iterations to reach Wasserstein-2 distance less than ; this improvement on the dependence on condition number is akin to acceleration in optimization. The design and analysis of HMC with the proposed integration time is built on the tools of Chebyshev polynomials. Experiments find the advantage of adopting our scheme of time-varying integration time even for sampling from distributions with smooth strongly convex potentials that are not quadratic.

1. INTRODUCTION

Markov chain Monte Carlo (MCMC) algorithms are fundamental techniques for sampling from probability distributions, which is a task that naturally arises in statistics (Duane et al., 1987; Girolami & Calderhead, 2011 ), optimization (Flaxman et al., 2005; Duchi et al., 2012; Jin et al., 2017) , machine learning and others (Wenzel et al., 2020; Salakhutdinov & Mnih, 2008; Koller & Friedman, 2009; Welling & Teh, 2011) . Among all the MCMC algorithms, the most popular ones perhaps are Langevin methods (Li et al., 2022; Dalalyan, 2017; Durmus et al., 2019; Vempala & Wibisono, 2019; Lee et al., 2021b; Chewi et al., 2020) and Hamiltonian Monte Carlo (HMC) (Neal, 2012; Betancourt, 2017; Hoffman & Gelman, 2014; Levy et al., 2018) . For the former, recently there have been a sequence of works leveraging some techniques in optimization to design Langevin methods, which include borrowing the idea of momentum methods like Nesterov acceleration (Nesterov, 2013) to design fast methods, e.g., (Ma et al., 2021; Dalalyan & Riou-Durand, 2020) . Specifically, Ma et al. (2021) show that for sampling from distributions satisfying the log-Sobolev inequality, under-damped Langevin improves the iteration complexity of over-damped Langevin from O( d ) to O( d ), where d is the dimension and is the error in KL divergence, though whether their result has an optimal dependency on the condition number is not clear. On the other hand, compared to Langevin methods, the connection between HMCs and techniques in optimization seems rather loose. Moreover, to our knowledge, little is known about how to accelerate HMCs with a provable acceleration guarantee for converging to a target distribution. Specifically, Chen & Vempala (2019) show that for sampling from strongly log-concave distributions, the iteration complexity of ideal HMC is O(κ log 1 ), and Vishnoi (2021) shows the same rate of ideal HMC when the potential is strongly convex quadratic in a nice tutorial. In contrast, there are a few methods that exhibit acceleration when minimizing strongly convex quadratic functions in optimization. For example, while Heavy Ball (Polyak, 1964) does not have an accelerated linear rate globally for minimizing general smooth strongly convex functions, it does show acceleration when minimizing strongly convex quadratic functions (Wang et al., 2020;  

