ACCELERATED RIEMANNIAN OPTIMIZATION: HAN-DLING CONSTRAINTS TO BOUND GEOMETRIC PENAL-TIES

Abstract

We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in Hadamard manifolds. Our algorithm enjoys the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors. Crucially, we can enforce our method to stay within a compact set we define. Prior fully accelerated works resort to assuming that the iterates of their algorithms stay in some pre-specified compact set, except for two previous methods, whose applicability is limited to local optimization and to spaces of constant curvature, respectively. Achieving global and general Riemannian acceleration without iterates assumptively staying in the feasible set was asked as an open question in (Kim & Yang, 2022), which we solve for Hadamard manifolds. In our solution, we show that we can use a linearly convergent algorithm for constrained strongly g-convex smooth problems to implement a Riemannian inexact proximal point operator that we use as a subroutine, which is of independent interest.

1. INTRODUCTION

Riemannian optimization concerns the optimization of a function defined over a Riemannian manifold. It is motivated by constrained problems that can be naturally expressed on Riemannian manifolds allowing to exploit the geometric structure of the problem and effectively transforming it into an unconstrained one. Moreover, there are problems that are not convex in the Euclidean setting, but that when posed as problems over a manifold with the right metric, are convex when restricted to every geodesic, and this allows for fast optimization (Cruz Neto et al., 2006; Carvalho Bento & Melo, 2012; Bento et al., 2015; Allen-Zhu et al., 2018) . That is, they are geodesically convex (g-convex) problems, cf. Definition 1.1. Some applications of Riemannian optimization in machine learning include dictionary learning (Cherian & Sra, 2017; Sun et al., 2017) , robust covariance estimation in Gaussian distributions (Wiesel, 2012) , Gaussian mixture models (Hosseini & Sra, 2015) , operator scaling (Allen-Zhu et al., 2018) , computation of Brascamp-Lieb constants (Bennett et al., 2008) , Karcher mean (Zhang et al., 2016) , Wasserstein Barycenters (Weber & Sra, 2017), low-rank matrix completion (Cambier & Absil, 2016; Heidel & Schulz, 2018; Mishra & Sepulchre, 2014; Tan et al., 2014; Vandereycken, 2013) , optimization under orthogonality constraints (Edelman et al., 1998; Lezcano-Casado & Martínez-Rubio, 2019) , and sparse principal component analysis (Genicot et al., 2015; Huang & Wei, 2019b; Jolliffe et al., 2003) . The first seven problems are defined over Hadamard manifolds, which we consider in this work. In fact, the optimization in these cases is over symmetric spaces, which satisfy a property that one instance of our algorithm requires, cf. Theorem 2.4. Riemannian optimization, whether under g-convexity or not, is an extensive and active area of research, for which one aspires to develop Riemannian optimization algorithms that share analogous properties to the more broadly studied Euclidean methods, such as the following kinds of Riemannian first-order methods: deterministic (Bento et al., 2017; Wei et al., 2016; Zhang & Sra, 2016 ), adaptive (Kasai et al., 2019) , projection-free (Weber & Sra, 2017; 2019 ), saddle-point-escaping (Criscitiello & Boumal, 2019; Sun et al., 2019; Zhou et al., 2019; Criscitiello & Boumal, 2020 ), stochastic (Hosseini & Sra, 2017; Khuzani & Li, 2017; Tripuraneni et al., 2018 ), variance-reduced (Sato et al., 2017; 2019; Zhang et al., 2016) , and min-max methods (Zhang et al., 2022) , among others. Riemannian generalizations to accelerated convex optimization are appealing due to their better convergence rates with respect to unaccelerated methods, specially in ill-conditioned problems. Acceleration in Euclidean convex optimization is a concept that has been broadly explored and has provided many different fast algorithms. A paradigmatic example is Nesterov's Accelerated Gradient Descent (AGD), cf. (Nesterov, 1983) , which is considered the first general accelerated method, where the conjugate gradients method can be seen as an accelerated predecessor in a more limited scope (Martínez-Rubio, 2021). There have been recent efforts to better understand this phenomenon in the Euclidean case (Allen Zhu & Orecchia, 2017; Su et al., 2016; Drori & Teboulle, 2014; Wibisono et al., 2016; Diakonikolas & Orecchia, 2019; Joulani et al., 2020) , which have yielded some fruitful techniques for the general development of methods and analyses. These techniques have allowed for a considerable number of new results going beyond the standard oracle model, convexity, or beyond first-order, in a wide variety of settings (Tseng, 2008; Beck & Teboulle, 2009; Wang et al., 2016a; Allen Zhu & Orecchia, 2015; Allen-Zhu, 2017; 2018; Carmon et al., 2017; Diakonikolas & Orecchia, 2018; Hinder et al., 2019; Gasnikov et al., 2019; Ivanova et al., 2021; Kamzolov & Gasnikov, 2020; Criado et al., 2021) , among many others. There have been some efforts to achieve acceleration for Riemannian algorithms as generalizations of AGD, cf. Section 1.3. These works try to answer the following fundamental question: Can a Riemannian first-order method enjoy the same rates of convergence as Euclidean AGD? The question is posed under (possibly strongly) geodesic convexity and smoothness of the function to be optimized. And due to the lower bound in (Criscitiello & Boumal, 2021) , we know the optimization must be under bounded geodesic curvature of the Riemannian manifold, and we might have to optimize over a bounded domain.

Main result

In this work, we study the question above in the case of Hadamard manifolds ℳ of bounded sectional curvature and provide an instance of our framework for a wide class of Hadamard manifolds. For a differentiable 𝑓 : ℳ → R with a global minimizer at 𝑥 * , let 𝑥 0 ∈ ℳ be an initial point and 𝑅 be an upper bound on the distance 𝑑(𝑥 0 , 𝑥 * ). If 𝑓 is 𝐿-smooth and (possibly 𝜇-strongly) g-convex in a closed ball of center 𝑥 * and radius 𝑂(𝑅), our algorithms obtain the same rates of convergence as AGD, up to logarithmic factors and up to a geometric penalty factor, cf. Theorem 2.4. See Table 1 for a succint comparison among accelerated algorithm and their rates. This algorithm is a consequence of the general framework we design: General accelerated scheme Riemacon. Given a not necessarily accelerated, linearly-convergent subroutine for strongly g-convex smooth problems, constrained to a geodesically convex set 𝒳 , we design first-order algorithms that enjoy the same rates as AGD when approximating min 𝑥∈𝒳 𝑓 (𝑥), up to logarithmic factors and up to a geometric penalty factor, where 𝑓 : 𝒩 ⊂ ℳ → R is a differentiable function that is smooth and g-convex (or strongly g-convex) in 𝒳 ⊂ 𝒩 , cf. Theorem 2.2. Importantly, our algorithms obtain acceleration without an undesirable assumption that most previous works had to make: that the iterates of the algorithm stay inside of a pre-specified compact set without any mechanism for enforcing or guarateeing this condition. To the best of our knowledge only two previous methods are able to deal with some form of constraints, and they apply to the limited settings of local optimization (Criscitiello & Boumal, 2021) and constant sectional curvature manifolds (Martínez-Rubio, 2021), respectively. Techniques in the rest of papers resort to assuming that the iterates of their algorithms are always feasible. Removing this condition in general, global, and fully accelerated methods was posed as an open question in (Kim & Yang, 2022) , that we solve for the case of Hadamard manifolds. The difficulty of constraining problems in order to bound geometric penalties as well as the necessity of achieving this goal in order to provide full optimization guarantees with bounded geometric penalties is something that has also been noted in other kinds of Riemannian algorithms, cf. (Hosseini & Sra, 2020) . We develop new techniques on inexact proximal methods in Riemannian manifolds and show that with access to a (not necessarily accelerated) constrained linear subroutine for strongly g-convex and smooth problems, we can inexactly solve a proximal subproblem to enough accuracy so it can be used in our accelerated outer loop, in the spirit of other Euclidean algorithms like Catalyst (Lin



Most of the notations in this work have a link to their definitions. For example, if you click or tap on any instance of 𝐿, you will jump to the place where it is defined as the smoothness constant of the function we consider in this work.

