ACCELERATING HAMILTONIAN MONTE CARLO VIA CHEBYSHEV INTEGRATION TIME

Abstract

Hamiltonian Monte Carlo (HMC) is a popular method in sampling. While there are quite a few works of studying this method on various aspects, an interesting question is how to choose its integration time to achieve acceleration. In this work, we consider accelerating the process of sampling from a distribution π(x) ∝ exp(-f (x)) via HMC via time-varying integration time. When the potential f is L-smooth and m-strongly convex, i.e. for sampling from a log-smooth and strongly log-concave target distribution π, it is known that under a constant integration time, the number of iterations that ideal HMC takes to get an Wasserstein-2 distance to the target π is O(κ log 1 ), where κ := L m is the condition number. We propose a scheme of time-varying integration time based on the roots of Chebyshev polynomials. We show that in the case of quadratic potential f , i.e. when the target π is a Gaussian distribution, ideal HMC with this choice of integration time only takes O( √ κ log 1 ) number of iterations to reach Wasserstein-2 distance less than ; this improvement on the dependence on condition number is akin to acceleration in optimization. The design and analysis of HMC with the proposed integration time is built on the tools of Chebyshev polynomials. Experiments find the advantage of adopting our scheme of time-varying integration time even for sampling from distributions with smooth strongly convex potentials that are not quadratic.

1. INTRODUCTION

Markov chain Monte Carlo (MCMC) algorithms are fundamental techniques for sampling from probability distributions, which is a task that naturally arises in statistics (Duane et al., 1987; Girolami & Calderhead, 2011 ), optimization (Flaxman et al., 2005; Duchi et al., 2012; Jin et al., 2017) , machine learning and others (Wenzel et al., 2020; Salakhutdinov & Mnih, 2008; Koller & Friedman, 2009; Welling & Teh, 2011) . Among all the MCMC algorithms, the most popular ones perhaps are Langevin methods (Li et al., 2022; Dalalyan, 2017; Durmus et al., 2019; Vempala & Wibisono, 2019; Lee et al., 2021b; Chewi et al., 2020) and Hamiltonian Monte Carlo (HMC) (Neal, 2012; Betancourt, 2017; Hoffman & Gelman, 2014; Levy et al., 2018) . For the former, recently there have been a sequence of works leveraging some techniques in optimization to design Langevin methods, which include borrowing the idea of momentum methods like Nesterov acceleration (Nesterov, 2013) to design fast methods, e.g., (Ma et al., 2021; Dalalyan & Riou-Durand, 2020) . Specifically, Ma et al. (2021) show that for sampling from distributions satisfying the log-Sobolev inequality, under-damped Langevin improves the iteration complexity of over-damped Langevin from O( d ) to O( d ), where d is the dimension and is the error in KL divergence, though whether their result has an optimal dependency on the condition number is not clear. On the other hand, compared to Langevin methods, the connection between HMCs and techniques in optimization seems rather loose. Moreover, to our knowledge, little is known about how to accelerate HMCs with a provable acceleration guarantee for converging to a target distribution. Specifically, Chen & Vempala (2019) show that for sampling from strongly log-concave distributions, the iteration complexity of ideal HMC is O(κ log 1 ), and Vishnoi (2021) shows the same rate of ideal HMC when the potential is strongly convex quadratic in a nice tutorial. In contrast, there are a few methods that exhibit acceleration when minimizing strongly convex quadratic functions in optimization. For example, while Heavy Ball (Polyak, 1964) does not have an accelerated linear rate globally for minimizing general smooth strongly convex functions, it does show acceleration when minimizing strongly convex quadratic functions (Wang et al., 2020;  Algorithm 1: IDEAL HMC 1: Require: an initial point x0 ∈ R d , number of iterations K, and a scheme of integration time {η (K) k }. 2: for k = 1 to K do 3: Sample velocity ξ ∼ N (0, I d ). 4: Set (x k , v k ) = HMC η (K) k (x k-1 , ξ). 5: end for 2021; 2022). This observation makes us wonder whether one can get an accelerated linear rate of ideal HMC for sampling, i.e., O( √ κ log 1 ), akin to acceleration in optimization. We answer this question affirmatively, at least in the Gaussian case. We propose a time-varying integration time for HMC, and we show that ideal HMC with this time-varying integration time exhibits acceleration when the potential is a strongly convex quadratic (i.e. the target π is a Gaussian), compared to what is established in Chen & Vempala (2019) and Vishnoi (2021) for using a constant integration time. Our proposed time-varying integration time at each iteration of HMC depends on the total number of iterations K, the current iteration index k, the strong convexity constant m, and the smoothness constant L of the potential; therefore, the integration time at each iteration is simple to compute and is set before executing HMC. Our proposed integration time is based on the roots of Chebysev polynomials, which we will describe in details in the next section. In optimization, Chebyshev polynomials have been used to help design accelerated algorithms for minimizing strongly convex quadratic functions, i.e., Chebyshev iteration (see e.g., Section 2.3 in d'Aspremont et al. ( 2021)). Our result of accelerating HMC via using the proposed Chebyshev integration time can be viewed as the sampling counterpart of acceleration from optimization. Interestingly, for minimizing strongly convex quadratic functions, acceleration of vanilla gradient descent can be achieved via a scheme of step sizes that is based on a Chebyshev polynomial, see e.g., Agarwal et al. ( 2021), and our work is inspired by a nice blog article by Pedregosa (2021). Hence, our acceleration result of HMC can also be viewed as a counterpart in this sense. In addition to our theoretical findings, we conduct experiments of sampling from a Gaussian as well as sampling from distributions whose potentials are not quadratics, which include sampling from a mixture of two Gaussians, Bayesian logistic regression, and sampling from a hard distribution that was proposed in Lee et al. (2021a) for establishing some lower-bound results of certain Metropolized sampling methods. Experimental results show that our proposed time-varying integration time also leads to a better performance compared to using the constant integration time of Chen & Vempala (2019) and Vishnoi (2021) for sampling from the distributions whose potential functions are not quadratic. We conjecture that our proposed time-varying integration time also helps accelerate HMC for sampling from log-smooth and strongly log-concave distributions, and we leave the analysis of such cases for future work.

2.1. HAMILTONIAN MONTE CARLO (HMC)

Suppose we want to sample from a target probability distribution ν(x) ∝ exp(-f (x)) on R d , where f : R d → R is a continuous function which we refer to as the potential. Denote x ∈ R d the position and v ∈ R d the velocity of a particle. In this paper, we consider the standard Hamiltonian of the particle (Chen & Vempala, 2019; Neal, 2012), which is defined as (2) H(x, v) := f (x) + 1 2 v 2 ,



while we refer the readers to Girolami & Calderhead (2011); Hirt et al. (2021); Brofos & Lederman (2021) and the references therein for other notions of the Hamiltonian. The Hamiltonian flow generated by H is the flow of the particle which evolves according to the following differential equations: For the standard Hamiltonian defined in (1), the Hamiltonian flow becomes dx dt = v and dv dt = -∇f (x).

