TOWARDS CONVERGENCE TO NASH EQUILIBRIA IN TWO-TEAM ZERO-SUM GAMES

Abstract

Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identical to that of their teammates and opposite to that of the opposing team. We focus on the solution concept of Nash equilibria (NE). We first show that computing NE for this class of games is hard for the complexity class CLS. To further examine the capabilities of online learning algorithms in games with fullinformation feedback, we propose a benchmark of a simple -yet nontrivialfamily of such games. These games do not enjoy the properties used to prove convergence for relevant algorithms. In particular, we use a dynamical systems perspective to demonstrate that gradient descent-ascent, its optimistic variant, optimistic multiplicative weights update, and extra gradient fail to converge (even locally) to a Nash equilibrium. On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys lastiterate local convergence to a Nash equilibrium. We also believe our proposed method is of independent interest for general min-max optimization.

1. INTRODUCTION

Online learning shares an enduring relationship with game theory that has a very early onset dating back to the analysis of fictitious play by (Robinson, 1951 ) and Blackwell's approachability theorem (Blackwell, 1956) . A key question within this context is whether self-interested agents can arrive at a game-theoretic equilibrium in an independent and decentralized manner with only limited feedback from their environment. Learning dynamics that converge to different notions of equilibria are known to exist for two-player zero-sum games (Robinson, 1951; Arora et al., 2012; Daskalakis et al., 2011 ), potential games (Monderer & Shapley, 1996) , near-potential games (Anagnostides et al., 2022b) , socially concave games (Golowich et al., 2020) , and extensive form games (Anagnostides et al., 2022a) . We try to push the boundary further and explore whether equilibria -in particular, Nash equilibria-can be reached by agents that follow decentralized learning algorithms in two-team zero-sum games. Team competition has played a central role in the development of game theory (Marschak, 1955; von Stengel & Koller, 1997; Bacharach, 1999; Gold, 2005) , economics (Marschak, 1955; Gottinger, 1974) , and evolutionary biology (Nagylaki, 1993; Nowak et al., 2004) . Recently, competition among teams has attracted the interest of the machine learning community due to the advances that multiagent systems have accomplished: e.g., multi-GAN's (Hoang et al., 2017; Hardy et al., 2019) for generative tasks, adversarial regression with multiple learners (Tong et al., 2018) , or AI agents competing in e-sports (e.g., CTF (Jaderberg et al., 2019) or Starcraft (Vinyals et al., 2019) ) as well as card games (Moravčík et al., 2017; Brown & Sandholm, 2018; Bowling et al., 2015) . Our class of games. We turn our attention to two-team zero-sum games a quite general class of min-max optimization problems that include bilinear games and a wide range of nonconvexnonconcave games as well. In this class of games, players fall into two teams of size n, m and submit their own randomized strategy vectors independently. We note that the games that we focus on are not restricted to team games in the narrow sense of the term "team" as we use it in sports, games, and so on; the players play independently and do not follow a central coordinating authority. Rather, for the purpose of this paper, teams are constituted by agents that merely enjoy the same utility function. This might already hint that the solution concept that we engage with is the Nash equilibrium (NE). Another class of games that is captured by this framework is the class of adversarial potential games. In these games, the condition that all players of the same team experience the same utility is weakened as long as there exists a potential function that can track differences in the utility of each player when they unilaterally deviate from a given strategy profile (see Appendix A.2 for a formal definition). A similar setting has been studied in the context of nonatomic games (Babaioff et al., 2009) . Positive duality gap. In two-player zero-sum games, i.e., n = m = 1, min-max (respectively max-min) strategies are guaranteed to form a Nash equilibrium due to Von Neumann's minmax theorem (Von Neumann, 1928) ; ultimately endowing the game with a unique value. The challenges arise for the case of n, m > 1; Schulman & Vazirani (2019b) prove that, in general, two-team games do not have a unique value. They do so by presenting a family of team games with a positive duality gap, together with bounds concerning this gap. These bounds quantify the effect of exchanging the order of commitment to their strategy either between the teams as a whole or the individual players. Solution concept. In this work, we examine the solution concept of Nash equilibrium (NE). Under a Nash equilibrium, no player can improve their utility by unilaterally deviating. The main downside of a NE for team games is the fact that such an equilibrium can be arbitrarily suboptimal for the team (Basilico et al., 2017a) . This is one of the reasons that the solution concept of team-maxmin equilibrium with a coordination device (TMECor) has dominated contemporary literature of team games, especially in regard to applications (Farina et al., 2018; Zhang et al., 2020; Cacciamani et al., 2021) . Under a TMECor, players are allowed to communicate before the game and decide upon combinations of strategies to be played during the game using an external source of randomness. The undeniable advantage of a TMECor is that the expected utility of the team under it is greater than the expected utility under a NE (Basilico et al., 2017a) . Nevertheless, this favorable property of TMECor can by no means render the study of NE irrelevant. In fact, the study of NE is always of independent interest within the literature of algorithmic game theory -especially questions corresponding to computational complexity. Moreover, there exist settings in which ex ante coordination cannot be expected to be possible or even sensible; for example in (i) environments where the external sources of randomness are unreliable or nonexistent or visible to the adversarial team, (ii) games in which players cannot know in advance who they share a common utility with, (iii) adversarial potential games. These games can model naturally occurring settings such as (a) security games with multiple uncoordinated defenders versus multiple similarly uncoordinated attackers, (b) the load balancing "game" between telecommunication service providers trying to minimize the maximum delay of service experienced by their customers versus the service users that try to individually utilize the maximum amount of broadband possible, and (c) the weak selection model of evolutionary biology where a species as a whole is a team, the genes of its population are the players and the alleles of each gene are in turn the actions of a player; the allele frequencies are independent across genes (Nagylaki, 1993; Nowak et al., 2004; Mehta et al., 2015) . Concluding, we could not possibly argue for a single correct solution concept for two-team games; there is no silver bullet. In contrast, one has to assess which is the most fitting based on the constraints of a given setting. A Nash equilibrium is a cornerstone concept of game theory and examining its properties in different games is always important. The optimization point of view. We focus on the solution concept of NE and we first note that computing local-NE in general nonconvex-nonconcave games is PPAD-complete (Daskalakis et al., 2009; 2021) . Thus, all well-celebrated online learning, first-order methods like gradient descentascent (Lin et al., 2020; Daskalakis & Panageas, 2019) , its optimistic (Popov, 1980; Chiang et al., 2012; Sridharan & Tewari, 2010) , optimistic multiplicative weights update (Sridharan, 2012) , and the extra gradient method (Korpelevich, 1976) would require an exponential number of steps in the parameters of the problem in order to compute an approximate NE under the oracle optimization model of (Nemirovskij & Yudin, 1983) . Additionally, in the continuous time regime, similar classes

