ACCELERATION IN HYPERBOLIC AND SPHERICAL SPACES

Abstract

We further research on the acceleration phenomenon on Riemannian manifolds by introducing the first global first-order method that achieves the same rates as accelerated gradient descent in the Euclidean space for the optimization of smooth and geodesically convex (g-convex) or strongly g-convex functions defined on the hyperbolic space or a subset of the sphere, up to constants and log factors. To the best of our knowledge, this is the first method that is proved to achieve these rates globally on functions defined on a Riemannian manifold M other than the Euclidean space. Additionally, for any Riemannian manifold of bounded sectional curvature, we provide reductions from optimization methods for smooth and gconvex functions to methods for smooth and strongly g-convex functions and vice versa.

1. INTRODUCTION

Acceleration in convex optimization is a phenomenon that has drawn lots of attention and has yielded many important results, since the renowned Accelerated Gradient Descent (AGD) method of Nesterov (1983) (2016) . This surge of research that applies tools of convex optimization to models going beyond convexity has been fruitful. One of these models is the setting of geodesically convex Riemannian optimization. In this setting, the function to optimize is geodesically convex (g-convex), i.e. convex restricted to any geodesic (cf. Definition 1.1). Riemannian optimization, g-convex and non-g-convex alike, is an extensive area of research. In recent years there have been numerous efforts towards obtaining Riemannian optimization algorithms that share analogous properties to the more broadly studied Euclidean first-order methods: deterministic de Carvalho Bento et al. ( 2017 



. Having been proved successful for deep learning Sutskever et al. (2013), among other fields, there have been recent efforts to better understand this phenomenon Allen Zhu & Orecchia (2017); Diakonikolas & Orecchia (2019); Su et al. (2016); Wibisono et al. (2016). These have yielded numerous new results going beyond convexity or the standard oracle model, in a wide variety of settings Allen-Zhu (2017; 2018a;b); Allen Zhu & Orecchia (2015); Allen Zhu et al. (2016); Allen-Zhu et al. (2017); Carmon et al. (2017); Cohen et al. (2018); Cutkosky & Sarlós (2019); Diakonikolas & Jordan (2019); Diakonikolas & Orecchia (2018); Gasnikov et al. (2019); Wang et al.

); Wei et al. (2016); Zhang & Sra (2016), stochastic Hosseini & Sra (2017); Khuzani & Li (2017); Tripuraneni et al. (2018), variance-reduced Sato et al. (2017; 2019); Zhang et al. (2016), adaptive Kasai et al. (2019), saddle-point-escaping Criscitiello & Boumal (2019); Sun et al. (2019); Zhang et al. (2018); Zhou et al. (2019); Criscitiello & Boumal (2020), and projection-free methods Weber & Sra (2017; 2019), among others. Unsurprisingly, Riemannian optimization has found many applications in machine learning, including low-rank matrix completion Cambier & Absil (2016); Heidel & Schulz (2018); Mishra & Sepulchre (2014); Tan et al. (2014); Vandereycken (2013), dictionary learning Cherian & Sra (2017); Sun et al. (2017), optimization under orthogonality constraints Edelman et al. (1998), with applications to Recurrent Neural Networks Lezcano-Casado (2019); Lezcano-Casado & Martínez-Rubio (2019), robust covariance estimation in Gaussian distributions Wiesel (2012), Gaussian mixture models Hosseini & Sra (2015), operator scaling Allen-Zhu et al. (2018), and sparse principal component analysis Genicot et al. (2015); Huang & Wei (2019b); Jolliffe et al. (2003).

