GRADIENT DESCENT ASCENT FOR MIN-MAX PROB-LEMS ON RIEMANNIAN MANIFOLDS Anonymous

Abstract

In the paper, we study a class of useful non-convex minimax optimization problems on Riemanian manifolds and propose a class of Riemanian gradient descent ascent algorithms to solve these minimax problems. Specifically, we propose a new Riemannian gradient descent ascent (RGDA) algorithm for the deterministic minimax optimization. Moreover, we prove that the RGDA has a sample complexity of O(κ 2 -2 ) for finding an -stationary point of the nonconvex stronglyconcave minimax problems, where κ denotes the condition number. At the same time, we introduce a Riemannian stochastic gradient descent ascent (RSGDA) algorithm for the stochastic minimax optimization. In the theoretical analysis, we prove that the RSGDA can achieve a sample complexity of O(κ 4 -4 ). To further reduce the sample complexity, we propose a novel momentum variance-reduced Riemannian stochastic gradient descent ascent (MVR-RSGDA) algorithm based on a new momentum variance-reduced technique of STORM. We prove that the MVR-RSGDA algorithm achieves a lower sample complexity of Õ(κ 4 -3 ) without large batches, which reaches near the best known sample complexity for its Euclidean counterparts. Extensive experimental results on the robust deep neural networks training over Stiefel manifold demonstrate the efficiency of our proposed algorithms.

1. INTRODUCTION

In the paper, we study a class of useful non-convex minimax (a.k.a. min-max) problems on the Riemannian manifold M with the definition as: min x∈M max y∈Y f (x, y), where the function f (x, y) is µ-strongly concave in y but possibly nonconvex in x. Here Y ⊆ R d is a convex and closed set. f (•, y) : M → R for all y ∈ Y is a smooth but possibly nonconvex real-valued function on manifold M, and f (x, •) : Y → R for all x ∈ M a smooth and (strongly)concave real-valued function. In this paper, we mainly focus on the stochastic minimax optimization problem f (x, y) := E ξ∼D [f (x, y; ξ)], where ξ is a random variable that follows an unknown distribution D. In fact, the problem (1) is associated to many existing machine learning applications: 1). Robust Training DNNs over Riemannian manifold. Deep Neural Networks (DNNs) recently have been demonstrating exceptional performance on many machine learning applications. However, they are vulnerable to the adversarial example attacks, which show that a small perturbation in the data input can significantly change the output of DNNs. Thus, the security properties of DNNs have been widely studied. One of secured DNN research topics is to enhance the robustness of DNNs under the adversarial example attacks. To be more specific, given training data D := {ξ i = (a i , b i )} n i=1 , where a i ∈ R d and b i ∈ R represent the features and label of sample ξ i respectively. Each data sample a i can be corrupted by a universal small perturbation vector y to generate an adversarial attack sample a i + y, as in (Moosavi-Dezfooli et al., 2017; Chaubey et al., 2020) . To make DNNs robust against adversarial attacks, one popular approach is to solve the following robust training problem: (2)



a i + y; x), b i ) ,

