ENHANCED FIRST AND ZEROTH ORDER VARIANCE REDUCED ALGORITHMS FOR MIN-MAX OPTIMIZA-TION Anonymous

Abstract

Min-max optimization captures many important machine learning problems such as robust adversarial learning and inverse reinforcement learning, and nonconvexstrongly-concave min-max optimization has been an active line of research. Specifically, a novel variance reduction algorithm SREDA was proposed recently by (Luo et al. 2020) to solve such a problem, and was shown to achieve the optimal complexity dependence on the required accuracy level . Despite the superior theoretical performance, the convergence guarantee of SREDA requires stringent initialization accuracy and an -dependent stepsize for controlling the per-iteration progress, so that SREDA can run very slowly in practice. This paper develops a novel analytical framework that guarantees the SREDA's optimal complexity performance for a much enhanced algorithm SREDA-Boost, which has less restrictive initialization requirement and an accuracy-independent (and much bigger) stepsize. Hence, SREDA-Boost runs substantially faster in experiments than SREDA. We further apply SREDA-Boost to propose a zeroth-order variance reduction algorithm named ZO-SREDA-Boost for the scenario that has access only to the information about function values not gradients, and show that ZO-SREDA-Boost outperforms the best known complexity dependence on . This is the first study that applies the variance reduction technique to zeroth-order algorithm for min-max optimization problems.

1. INTRODUCTION

Min-max optimization has attracted significant growth of attention in machine learning as it captures several important machine learning models and problems including generative adversarial networks (GANs) Goodfellow et al. (2014) , robust adversarial machine learning Madry et al. (2018) , imitation learning Ho & Ermon (2016), etc. Min-max optimization typically takes the following form min x∈R d 1 max y∈R d 2 f (x, y), where f (x, y) E[F (x, y; ξ)] (online case) 1 n n i=1 F (x, y; ξ i ) (finite-sum case) (1) where f (x, y) takes the expectation form if data samples ξ are taken in an online fashion, and f (x, y) takes the finite-sum form if a dataset of training samples ξ i for i = 1, . . . , n are given in advance. This paper focuses on the nonconvex-strongly-concave min-max problem, in which f (x, y) is nonconvex with respect to x for all y ∈ R d2 , and f (x, y) is µ-strongly concave with respect to y for all x ∈ R d1 . The problem then takes the following equivalent form: min x∈R d 1 Φ(x) max y∈R d 2 f (x, y) . The objective function Φ(•) in eq. ( 2) is nonconvex in general, and hence algorithms for solving eq. ( 2) are expected to attain an approximate (i.e., -accurate) first-order stationary point. The convergence of deterministic algorithms for solving eq. ( 2 



) has been established in Jin et al. (2019); Nouiehed et al. (2019); Thekumparampil et al. (2019); Lu et al. (2020). SGD-type of stochastic algorithms have also been proposed to solve such a problem more efficiently, including SGDmax Jin et al. (2019),

