ACCELERATED RIEMANNIAN OPTIMIZATION: HAN-DLING CONSTRAINTS TO BOUND GEOMETRIC PENAL-TIES

Abstract

We propose a globally-accelerated, first-order method for the optimization of smooth and (strongly or not) geodesically-convex functions in Hadamard manifolds. Our algorithm enjoys the same convergence rates as Nesterov's accelerated gradient descent, up to a multiplicative geometric penalty and log factors. Crucially, we can enforce our method to stay within a compact set we define. Prior fully accelerated works resort to assuming that the iterates of their algorithms stay in some pre-specified compact set, except for two previous methods, whose applicability is limited to local optimization and to spaces of constant curvature, respectively. Achieving global and general Riemannian acceleration without iterates assumptively staying in the feasible set was asked as an open question in (Kim & Yang, 2022), which we solve for Hadamard manifolds. In our solution, we show that we can use a linearly convergent algorithm for constrained strongly g-convex smooth problems to implement a Riemannian inexact proximal point operator that we use as a subroutine, which is of independent interest.

1. INTRODUCTION

Riemannian optimization concerns the optimization of a function defined over a Riemannian manifold. It is motivated by constrained problems that can be naturally expressed on Riemannian manifolds allowing to exploit the geometric structure of the problem and effectively transforming it into an unconstrained one. Moreover, there are problems that are not convex in the Euclidean setting, but that when posed as problems over a manifold with the right metric, are convex when restricted to every geodesic, and this allows for fast optimization (Cruz Neto et al., 2006; Carvalho Bento & Melo, 2012; Bento et al., 2015; Allen-Zhu et al., 2018) . That is, they are geodesically convex (g-convex) problems, cf. Definition 1.1. Some applications of Riemannian optimization in machine learning include dictionary learning (Cherian & Sra, 2017; Sun et al., 2017) , robust covariance estimation in Gaussian distributions (Wiesel, 2012), Gaussian mixture models (Hosseini & Sra, 2015) , operator scaling (Allen-Zhu et al., 2018) , computation of Brascamp-Lieb constants (Bennett et al., 2008) , Karcher mean (Zhang et al., 2016) , Wasserstein Barycenters (Weber & Sra, 2017), low-rank matrix completion (Cambier & Absil, 2016; Heidel & Schulz, 2018; Mishra & Sepulchre, 2014; Tan et al., 2014; Vandereycken, 2013) , optimization under orthogonality constraints (Edelman et al., 1998; Lezcano-Casado & Martínez-Rubio, 2019) , and sparse principal component analysis (Genicot et al., 2015; Huang & Wei, 2019b; Jolliffe et al., 2003) . The first seven problems are defined over Hadamard manifolds, which we consider in this work. In fact, the optimization in these cases is over symmetric spaces, which satisfy a property that one instance of our algorithm requires, cf. Theorem 2.4. Riemannian optimization, whether under g-convexity or not, is an extensive and active area of research, for which one aspires to develop Riemannian optimization algorithms that share analogous properties to the more broadly studied Euclidean methods, such as the following kinds of Riemannian first-order methods: deterministic (Bento et al., 2017; Wei et al., 2016; Zhang & Sra, 2016) , adaptive



Most of the notations in this work have a link to their definitions. For example, if you click or tap on any instance of 𝐿, you will jump to the place where it is defined as the smoothness constant of the function we consider in this work.

