LEARNING A MINIMAX OPTIMIZER: A PILOT STUDY

Abstract

Solving continuous minimax optimization is of extensive practical interest, yet notoriously unstable and difficult. This paper introduces the learning to optimize (L2O) methodology to the minimax problems for the first time and addresses its accompanying unique challenges. We first present Twin-L2O, the first dedicated minimax L2O framework consisting of two LSTMs for updating min and max variables separately. The decoupled design is found to facilitate learning, particularly when the min and max variables are highly asymmetric. Empirical experiments on a variety of minimax problems corroborate the effectiveness of Twin-L2O. We then discuss a crucial concern of Twin-L2O, i.e., its inevitably limited generalizability to unseen optimizees. To address this issue, we present two complementary strategies. Our first solution, Enhanced Twin-L2O, is empirically applicable for general minimax problems, by improving L2O training via leveraging curriculum learning. Our second alternative, called Safeguarded Twin-L2O, is a preliminary theoretical exploration stating that under some strong assumptions, it is possible to theoretically establish the convergence of Twin-L2O. We benchmark our algorithms on several testbed problems and compare against state-of-the-art minimax solvers.

1. INTRODUCTION

Many popular applications can be formulated into solving continuous minimax optimization, such as generative adversarial networks (GAN) (Goodfellow et al., 2014) , distributionally robust learning (Globerson & Roweis, 2006) , domain adaptation (Ganin & Lempitsky, 2014 ), distributed computing (Shamma, 2008; Mateos et al., 2010 ), privacy protection (Wu et al., 2018; 2020) , among many more. This paper studies such problems: we consider a cost function f : R m × R n → R and the min-max game min x max y f (x, y). We aim to find the saddle point (x * , y * ) of f : f (x * , y) ≤ f (x * , y * ) ≤ f (x, y * ), ∀(x, y) ∈ X × Y, where X ⊂ R m and Y ⊂ R n . If X = R m and Y = R n , (x * , y * ) is called a global saddle point; if X × Y is a neighborhood near (x * , y * ), (x * , y * ) is a local saddle point. The main challenge to solve problem (1) is the unstable dynamics of iterative algorithms. Simplest algorithms such as gradient descent ascent (GDA) can cycle around the saddle point or even diverge (Benaım & Hirsch, 1999; Mertikopoulos et al., 2018b; Lin et al., 2019) . Plenty of works have been developed recently to address this issue (Daskalakis et al., 2018; Daskalakis & Panageas, 2018; Liang & Stokes, 2019; Mertikopoulos et al., 2018a; Gidel et al., 2018; Mokhtari et al., 2019) . However, the convergence is still sensitive to the parameters in these algorithms. Even if the cost function is only changed by scaling, those parameters have to be re-tuned to ensure convergence. A recent trend of learning to optimize (L2O) parameterizes training algorithms to be learnable from data, such that the meta-learned optimizers can be adapted to a special class of functions and outperform general-purpose optimizers. That is particularly meaningful, when one has to solve a large number of yet similar optimization problems repeatedly and quickly. Specifically, for existing L2O methods that operate in the space of continuous optimization, almost all of them solve some

