CHAOS OF LEARNING BEYOND ZERO-SUM AND COORDINATION VIA GAME DECOMPOSITIONS

Abstract

It is of primary interest for Machine Learning to understand how agents learn and interact dynamically in competitive environments and games (e.g. GANs). But this has been a difficult task, as irregular behaviors are commonly observed in such systems. This can be explained theoretically, for instance, by the works of Cheung & Piliouras (2019; 2020), which showed that in two-person zero-sum games, if agents employ one of the most well-known learning algorithms, Multiplicative Weights Update (MWU), then Lyapunov chaos occurs everywhere in the cumulative payoff space. In this paper, we study how persistent chaos can occur in the general normal-form game settings, where the agents might have the motivation to coordinate (which is not true for zero-sum games) and the number of agents can be arbitrary. We characterize bimatrix games where MWU, its optimistic variant (OMWU) or Follow-the-Regularized-Leader (FTRL) algorithms are Lyapunov chaotic almost everywhere in the cumulative payoff space. Our characterizations are derived by extending the volume-expansion argument of Cheung & Piliouras via the canonical game decomposition into zero-sum and coordination components. Interestingly, the two components induce opposite volume-changing behaviors, so the overall behavior can be analyzed by comparing the strengths of the components against each other. The comparison is done via our new notion of "matrix domination" or via a linear program. For multi-player games, we present a local equivalence of volume change between general games and graphical games, which is used for volume and chaos analyses of MWU and OMWU in potential games.

1. INTRODUCTION

In Machine Learning (ML), it is of primary interest to understand how agents learn in competitive environments. This is more strongly propelled recently due to the success of Generative Adversarial Networks (GANs), which can be viewed as two neural-networks playing a zero-sum game. As such, Evolutionary Game Theory (EGT) (Hofbauer & Sigmund (1998) ; Sandholm (2010)), a decades-old area devoted to the study of adaptive (learning) behaviors of agents in competitive environments arising from Economics, Biology, and Physics, has drawn attention from the ML community. In contrast with the typical optimization (or no-regret) approach in ML, EGT provides a dynamicalsystemic perspective to understand ML processes, which has already provided new insights into a number of ML-related problems. This perspective is particularly helpful in studying "learning in games", where irregular behaviors are commonly observed, but the ML community currently lacks of a rigorous method to analyze such systems. In this paper, we study Lyapunov chaos, a central notion that captures instability and unpredictability in dynamical systems. We characterize general normal games where popular learning algorithms exhibit chaotic behaviors. Lyapunov chaos captures the butterfly effect: when the starting point of a dynamical system is slightly perturbed, the resulting trajectories and final outcomes diverge quickly; see Definition 1 for a formal definition. The perturbations correspond to round-off errors of numerical algorithms in ML (and Computer Science in general). 1 While significant efforts have been spent in analyzing and minimizing round-off effects of floating-point computations (Demmel (1997) ), they are unavoidable in generalfoot_1 . As round-offs are inevitable, and the round-off schemes can vary from machine to machine due to various hardware and software factors, we surely want to avoid chaotic learning that does not fulfill our primary goals in building predictable and reproducible learning systems. Such issues are exemplified by a quote from Ali Rahimi's NIPS'2017 test-of-time award speech: "Someone on another team changed the default rounding mode of some Tensorflow internals from 'truncate toward zero' to 'round to even'. Our training broke, our error rate went from less than 25% error to ∼ 99.97% error." To avoid chaotic learning, we first need to understand how it can arise. This is the main motivation of our work. Recently, in the context of "learning in games", Cheung & Piliouras (2019; 2020) presented interesting theoretical analyses to show that in two-person zero-sum and graphical constant-sum games, if the agents employ Multiplicative Weights Update (MWU) or Follow-the-Regularized-Leader (FTRL) algorithms, then Lyapunov chaos occurs everywhere in the cumulative payoff space; the same result holds for the optimistic variant of MWU (OMWU) in coordination games. 3 While zero-sum and coordination games are interesting in their own rights, they are rather small subspaces within the family of general normal games. In this paper, we present techniques and tools for characterizing general games where MWU, FTRL, or OMWU are Lyapunov chaotic almost everywhere. Next, we give an overview of our contributions and a discussion of related work. Our Contributions. To show the results about chaos mentioned above, Cheung & Piliouras (2019; 2020) used a classical technique in the study of dynamical systems called volume analysis. Volume analysis considers a set of starting points of positive Lebesgue measure, e.g. a ball centred at a point; volume is an alternative name for Lebesgue measure in this context. When this set of starting points evolves according to the rule of dynamical system, it becomes a new set with a different volume. Intuitively, volume is a measure of the range of possible outcomes, so the larger it is, the more unpredictable the system is. If the set's volume increases exponentially with time, then its diameter increases exponentially too, which implies Lyapunov chaos. This indicates that when players repeatedly play the games by employing the respective learning algorithms, a slight perturbation on the initiating condition can lead to a wide range of possible cumulative payoffs in the long run. This can be shown to imply instability in the mixed strategy space. The technical starting point of Cheung & Piliouras is to show that when all agents in a bimatrix game G use MWU with step-size , and when a set S in the cumulative payoff space evolves for one time step, its volume change can be expressed as 2 S C G (s) ds + O( 4 ), where C G is a function that depends on the game G, which we will define in Section 2. Clearly, the sign of C G dictates the volume change behaviors for all small enough . Cheung & Piliouras showed that C G is a positive function when G is a two-person zero-sum game. For a large region in the cumulative payoff space, this implies the volume change per time step is Ω( 2 )•volume(S), i.e. volume expands exponentially. They also showed that C G is a negative function if G is a two-person coordination game. To extend their volume expansion results to general bimatrix games, we first discover that C G admits a clean decoupling w.r.t. the canonical decomposition of such games into zero-sum and coordination components (Basar & Ho (1974) ; Kalai & Kalai). Precisely, given any two-person general game (A, B), it can be written uniquely as a direct sum of a zero-sum game (Z, -Z) and a coordination game (C, C), where Z = (A -B)/2 and C = (A + B)/2. Interestingly, we find that C G (•) = C (Z,-Z) (•) + C (C,C) (•) in Lemma 6. Recall from the last paragraph that C (Z,-Z) (•) is always positive, while C (C,C) (•) is always negative. Thus, to see whether volume expansion occurs, it boils down to comparing the strengths of C (Z,-Z) (•) and -C (C,C) (•). We also discover that the function C G is invariant upon additions of trivial matrices to the game G; see Definition 7 and Lemma 8. An immediate application of trivial matrices is for bimatrix potential games (Monderer & Shapley (1996) ), for which we show it can be transformed to a coordination game via additions of trivial matrices. With the result in Cheung & Piliouras (2020) , this immediately implies that OMWU in any bimatrix potential game is Lyapunov chaotic everywhere in the cumulative payoff space (Observation 10). Based on the above discoveries, we identify two characterizations of bimatrix games where MWU and FTRL are Lyapunov chaotic almost everywhere (Theorems 15 and 17). As said before, the key is to compare the strengths of the C-functions of the zero-sum and the coordination components. The comparison is done via our new notion of matrix domination (see Definition 11), and also a linear program (Eqn. ( 7)) which is designed to prune out the trivial-matrix projection and keep the remaining part minimal. Such family of games has positive Lebesgue measure in the bimatrix game space, so it is not confined to any proper game subspacefoot_3 . This justifies the claim that the occurrences of chaos are not only circumstantial, but a rather substantial issue in learning in games. Analogous result holds for OMWU. For games with any number of players (multi-player games), we use an observation in Cheung & Piliouras (2019), coupled with our new findings about bimatrix games discussed above, to present a new family of graphical games where MWU is Lyapunov chaotic almost everywhere (Theorem 18); the new family strictly includes all graphical constant-sum games. To facilitate volume analyses of learning in multi-player games, we establish their local equivalence of volume change with graphical games. Briefly, we show that C G (p) for a general game G is the same as C H (p) for some graphical game H; H will depend on the point p, that's why we say the equivalence is local (Theorem 19 ). This provides an intuitive procedure for understanding volume changes, as the volume change of learning in a graphical game is easier to compute. This is used to show that the volume-changing behaviors of MWU and OMWU are opposite to each other. We use these to analyze MWU and OMWU in multi-player potential games; in particular, we show that C G (p) of a multi-player potential game G is identical to C C (p) of a corresponding multi-player graphical coordination game, while C C (p) ≤ 0 for any p (Proposition 21). 5Related Work. MWU and its variant, such as FTRL and Optimistic MWU, play important roles in online learning. We recommend the texts of Cesa-Bianchi & Lugosi (2006) and Hart & Mas-Colell (2013) for a modern overview of online learning from Machine Learning or Economics perspectives. Recently, there is a stream of works that examine how learning algorithms behave in games or min-max optimization from a dynamical-systemic perspective. This provides new insights into learning systems which could hardly be obtained with classical tools in ML. For instance, some learning systems are shown to be nearly-periodic under the notion of Poincaré recurrence (Piliouras & Shamma (2014) ; Mertikopoulos et al. (2018) ). Using potential functions first proposed in EGT and other tools from mathematics, surprising behaviors of first-order methods in zero-sum games and min-max optimization were discovered (Daskalakis & Panageas (2018; 2019) ; Bailey & Piliouras (2018) ; Cheung (2018)). In Appendix A, we give an account on some further related work. Empirical evidences of Lyapunov chaos of learning in games were reported by Sato et al. (2002) and Galla & Farmer (2013) . Li-Yorke chaos, another classical chaos notion, was proved to occur in several learning-in-game systems (Palaiopanos et al. (2017) ; Chotibut et al. (2020) ). Volume analysis has long been a technique of interest in the study of population and game dynamics. It was discussed in a number of famous texts; see (Hofbauer & Sigmund, 1998, Section 11) , (Fudenberg & Levine, 1998, Section 3) and (Sandholm, 2010, Chapter 9) . We use game decomposition in this paper, which is a natural and generic approach to extend compelling results for a specific family of games to more general games. Let H denote the specific family of games with some compelling properties. Given a general game, we seek to decompose it into a sum of its projection on H, plus one or more residue components. If the residues are small, then it is plausible that those compelling properties extend (approximately). In seeking of games where learning is stable, game decompositions were used (Candogan et al. (2011; 2013a; b) ; Letcher et al. (2019) ) with H being potential games.

2. PRELIMINARY

In this paper, every bold lower-case alphabet denotes a vector, every bold upper-case alphabet denotes a matrix or a game. When we say a "game", we always mean a normal-form game. Given n, let ∆ n denote the mixed strategy space of dimension n -1foot_5 , i.e. {(z 1 , z 2 , • • • , z n ) | n j=1 z j = 1}. Normal-Form Games. Let N denote the number of players in a game. Let S i denote the strategy set of Player i, and S := S 1 × • • • × S N . Let n i = |S i |. s = (s 1 , • • • , s N ) ∈ S denotes a strategy profile of all players, and u i (s) denotes the payoff to Player i when each player picks s i . A mixed strategy profile is denoted by x = (x 1 , • • • , x N ) ∈ ∆ n1 × • • • × ∆ n N , and u i is extended to take mixed strategies as inputs via u i (x) = E s∼x [u i (s)]. We let -(i 1 , • • • , i g ) denote the player set other than players i 1 , • • • , i g . We also let U i1i2•••ig j1j2•••jg (x) = E s -(i 1 ,••• ,ig ) ∼x -(i 1 ,••• ,ig ) u i1 (s i1 = j 1 , • • • , s ig = j g , s -(i1,••• ,ig) ) , which is the expected payoff to Player i 1 when: for 1 ≤ f ≤ g, Player i f picks strategy j f , while for each player i / ∈ {i 1 , • • • i g }, she picks a strategy randomly following x i . We also use U i1i2•••ig j1j2•••jg if x is clear from the context. We say a game is a zero-sum game if i u i (s) = 0 for all s ∈ S, and we say a game is a coordination game if u i (s) = u k (s) for all Players i and k and for all s ∈ S. When N = 2, such games are called bimatrix games, for which we adopt the notations below. Let (A, B) denote a bimatrix game, where for any j ∈ S 1 , k ∈ S 2 , A jk := u 1 (j, k), B jk := u 2 (j, k). x and y denote mixed strategies of Players 1 and 2 respectively. A bimatrix game is a zero-sum game if A = -B; it is a coordination game if A = B. Note that U 1 j = [Ay] j , U 2 k = [B T x] k , which we denote by A j , B k respectively when x, y are clear from context; B j , A k are defined analogously. MWU, FTRL and OMWU in Games. All three algorithms have a step-size , and can be implemented as updating in the cumulative payoff (dual) space. In each round, the players' actions (mixed strategies) in the strategy space are functions of the cumulative payoff vectors to be defined below, and these actions are then used to determine the payoffs in the next round. For a player with d strategies, let p t ∈ R d denote her cumulative payoff vector at time t, and let p 0 ∈ R d denote the starting point chosen by the player. For MWU in a game, the update rule for Player i is p t+1 j = p t j + • U i j (x t ), where U i j is the function defined in Eqn. (1), and x t is the mixed strategy as below: x t j = x j (p t ) = exp(p t j )/( ∈Si exp(p t )) For OMWU in a game, the update rule for Player i starts with p 1 = p 0 , and for t ≥ 2, p t+1 j = p t j + • 2U i j (x t ) -U i j (x t-1 ) , where x t is determined by Eqn. (3). For FTRL in a game, the update rule for Player i is same as Eqn. (2), but x t is determined as below using a convex regularizer function h i : ∆ d → R: x t = arg max x∈∆ d { p t , x -h i (x)}. As all the results for MWU can be directly generalized to FTRL as discussed in (Cheung & Piliouras, 2019, Appendix D) , to keep our exposition simple, in the rest of this paper, we focus on MWU and OMWU, and their comparisons. For bimatrix game, we use p, q to denote the cumulative payoff vectors of Players 1 and 2 respectively. Dynamical Systems, Lyapunov Chaos and Volume Analysis. A learning-in-game system can be viewed as a discrete-time dynamical system, for which we present a simplified definition which suits our need. A discrete-time dynamical system in R d is determined by a starting point r(0) ∈ R d and an update rule r(t+1) = f (r(t)), where f : R d → R d is a function. 7 The sequence r(0), r(1), r(2), • • • is called a trajectory of the dynamical system. When f is clear from the context, we let Φ : (N ∪ {0})×R d → R d denote the function such that Φ(t, r) is the value of r(t) generated by the dynamical system with starting point set to r. Given a set U ⊂ R d , we let Φ(t, U) = {Φ(t, r)|r ∈ U}. Let B(r, z) denote the open ball with center r and radius z. There are several similar but not identical definitions of Lyapunov chaos, all capturing the butterfly effect: when the starting point is slightly perturbed, the resulting trajectories diverge quickly. We use the following definition, which was also used by Cheung & Piliouras (2019; 2020) implicitly. Intuitively, a system is Lyapunov chaotic in an open set O ⊂ R d if for any r ∈ O and any open ball B around r, as long as Φ(t, B) remains inside O, there exists r ∈ B such that Φ(t, r ) -Φ(t, r) grows exponentially with t. Lyapunov exponent in the definition below is a measure of how fast the exponential growth is; the larger it is, the more unpredictable the dynamical system is. Definition 1. A dynamical system is Lyapunov chaoticfoot_7 in an open set O ⊂ R d if there exists a constant λ > 0 and a Lyapunov exponent γ ≡ γ(O) > 0, such that for any r ∈ O, for any sufficiently small δ > 0 and for all t satisfying 0 ≤ t < min{τ | τ ≥ 0, Φ(τ, B(r, δ)) ⊂ O}, sup r ∈B(r,δ) Φ(t, r ) -Φ(t, r) ≥ λ • δ • exp(γt). Definition 2. A dynamical system is Lyapunov chaotic everywhere if it is Lyapunov chaotic in any bounded open subset of R d . In the above definitions, all norms and radii are Euclidean norms. For capturing round-off errors in computer algorithms and ML systems, it is more natural to use ∞ -norm with δ be the maximum round-off error per step, say ∼ 10 -16 when IEEE 754 binary64 (standard double) is used. When O is a small set, it is easy to determine whether a dynamical system is Lyapunov chaotic in O, since the dynamic can be locally approximated by a linear dynamical system, where the eigenvalues of the local Jacobian characterizes chaotic behaviors. But when O is large, determining whether Lyapunov chaos occurs is difficult in general. Cheung & Piliouras (2019) found that volume analysis can be useful in this regard, based on the following simple observation. Proposition 3. In R d , if a set U has volume at least v, then its radius w.r.t. any point r ∈ U is at least v 1/d /2. Thus, if the volume of Φ(t, U) of some dynamical system is Ω(exp(γt)) for some γ > 0, then the radius of Φ(t, U) w.r.t. any point r ∈ Φ(t, U) is Ω(exp( γ d • t) ). Cheung and Piliouras showed Lemma 4 below, which, for bimatrix games, reduces volume analysis to analyzing the sign of the function C (A,B) (p, q) defined in Eqn. ( 4) below; the sign also determines the local volume-changing behavior around the point (p, q) when MWU is used.foot_8 Based on Proposition 3 that converts volume expansion to radius expansion, the sign can be used to determine if the dynamical system is Lyapunov chaotic. Let A j = k A jk y k = ∇ xj [x T Ay] and A k = j x j A j k = ∇ y k [x T Ay]; and, similarly, B j = k B jk y k = ∇ xj [x T By] and B k = j x j B j k = ∇ y k [x T By]. Then, C (A,B) (p, q) = -E x,y [(A jk -A j -A k )(B jk -B j -B k )] + E x,y [A jk ] • E x,y [B jk ] . (4) Note that here x and y are the shorthand for x(p) and y(q), which are the mixed strategies (i.e. probability distributions over strategies) of Players 1 and 2 respectively, as computed via Eqn. (3). Also, E x,y [f (j, k)] = E (j,k)∼(x(p),y(q)) [f (j, k)] is the expected value of f (j, k) when the strategies j and k are randomly chosen according to the distrubutions x(p) and y(q) respectively. For multi-player game G, the analogous function C G (•) is given below; the U quantities were defined in Eqn. (1). Lemma 4 is adapted from Cheung & Piliouras (2019) for games with any number of players. Derivation of Eqn. ( 5) uses the Jacobian of the corresponding dynamical system and integration by substitution; see Appendix D. C G (p 1 , • • • , p N ) = - i∈[N ], j∈Si k>i, ∈S k x ij x k U ki j -U k U ik j -U i j . (5) Lemma 4. Suppose O is a set in the cumulative payoff space R d where d = n 1 + • • • + n N , cG (O) := inf (p1,••• ,p N )∈O C G (p 1 , • • • , p N ) > 0. ( ) Then for the dynamical system in which MWU with any sufficiently small step-size is employed to play the game G, it is Lyapunov chaotic in O with Lyapunov exponent cG (O) • 2 /2d. If MWU is replaced by OMWU, then the same result holds by replacing the condition (6) with cG (O) := inf (p1,••• ,p N )∈O [-C G (p 1 , • • • , p N )] > 0. Note that if we start from a Nash equilibrium in the strategy space, MWU and OMWU will stay at the equilibrium. However, if this equilibrium (x * , y * ) satisfies the conditions in Corollary 5 below, there are points arbitrarily close to the equilibrium that keep moving away from the equilibrium (if the region {(x , y ) = (x(p), y(q))|(p, q) ∈ O} is large in the strategy simplex ∆ n1 × ∆ n2 ). Corollary 5 (Adapted from (Cheung & Piliouras, 2020, Theorem 5)). Let (x * , y * ) be a point in the interior of the strategy space. Suppose that there exists (p, q) in the cumulative payoff space, such that x * = x(p * ) and y * = y(q * ). Furthermore, suppose C (A,B) (p * , q * ) > 0, and (p, q) ∈ O where O is the set described in Lemma 4. Then there are strategy points arbitrarily close to (x * , y * ) such that MWU in the game (A, B) eventually leaves the corresponding strategy set of O, i.e. {(x , y ) = (x(p), y(q))|(p, q) ∈ O}. We give the intuitions behind the proof of Corollary 5. Suppose the contrary, i.e. for any open neighbourhood of (p * , q * ), its flow never escapes from O. Then there are two contradicting facts. First, the volume of the flow expands at least exponentially with time. Second, by Eqn. (2), each p t j grows at most linearly with t (since |U i j | ≤ max j,k {|A jk |, |B jk |}) , and thus the volume of the flow can only expand at most polynomially with time. When the game is zero-sum, i.e., B = -A, hence C (A,B) (p, q) = E x,y (A jk -A j -A k ) 2 - E x,y [(A jk )] 2 . Since E x,y [A jk ] = E x,y [A j ] = E x,y [A k ] and hence E x,y [A jk -A j -A k ] = -E x,y [A jk ], C (A,B) (p, q) is indeed the variance of the random variable A jk -A j -A k , and thus is non-negative. By Eqn. (4), we have C (A,B) (p, q) = -C (A,-B) (p, q). Thus, for any coordination game (A, A), we have C (A,A) (p, q) = -C (A,-A) (p, q) ≤ 0.

3. BIMATRIX GAMES

In this section, we focus on general bimatrix games (A, B). In Section 3.1, we present two tools for analyzing C (A,B) (•), and then we provide an example to show how to use these tools. In Section 3.2, we present two characterizations such that the dynamics are Lyapunov chaotic almost everywhere.

3.1. TOOLS FOR ANALYZING BIMATRIX GAME

First Tool: Canonical Decomposition for Bimatrix Games. For every bimatrix game (A, B), it admits a canonical decomposition (Basar & Ho (1974) ; Kalai & Kalai) into the sum of a zero-sum game (Z, -Z) and a coordination game (C, C), where Z = 1 2 (A -B) and C = 1 2 (A + B), i.e. (A, B) = (Z, -Z) + (C, C). We call (Z, -Z) the zero-sum part of the game (A, B), and (C, C) the coordination part of the game. Our first result shows that the function C(•) can be decomposed neatly into the two parts too. Lemma 6. For any bimatrix game (A, B), C (A,B) (p, q) ≡ C (Z,-Z) (p, q) + C (C,C) (p, q). Proof. We use Eqn. (4) to expand the following: 4 • C (Z,-Z) (p, q) + 4 • C (C,C) (p, q) =E (A jk -B jk -A j + B j -A k + B k ) 2 -E [A jk -B jk ] 2 -E (A jk + B jk -A j -B j -A k -B k ) 2 + E [A jk + B jk ] 2 =E (A jk -B jk -A j + B j -A k + B k ) 2 -(A jk + B jk -A j -B j -A k -B k ) 2 -(E [A jk ] -E [B jk ]) 2 + (E [A jk ] + E [B jk ]) 2 =E [4(-B jk + B j + B k )(A jk -A j -A k )] + 4 • E [A jk ] • E [B jk ] = 4 • C (A,B) (p, q). By the end of Section 2, we discussed that C (Z,-Z) (p, q) is always non-negative and C (C,C) (p, q) is always non-positive. By the above lemma, we can analyze the volume-changing behavior of a bimatrix game (A, B) by looking at its zero-sum and coordination parts independently. One simple intuition is that if the coordination (resp. zero-sum) part is small, then the volumechanging behavior of (A, B) is closer to the behavior of the zero-sum (resp. coordination) part. We realize this intuition quantitatively in the next subsection. Second Tool: Trivial matrix. Trivial matrices are matrices which do not affect the volume-changing behavior, as depicted in Lemma 8 below. Definition 7 (Trivial Matrix). T ∈ R n×m is a trivial matrix if there exists real numbers u 1 , u 2 , • • • , u n and v 1 , v 2 , • • • , v m such that T jk = u j + v k for all j ∈ [n], k ∈ [m]. Lemma 8. For any two trivial matrices T 1 , T 2 , for any two matrices A, B ∈ R n×m , C (A,B) (p, q) ≡ C (A+T 1 ,B+T 2 ) (p, q). One immediate application of this lemma is for two player potential games. Definition 9. A game G is a potential game if there exists a potential function P : S → R such that for any Player i and any strategy profile s ∈ S, P(s i , s -i )-P(s i , s -i ) = u i (s i , s -i )-u i (s i , s -i ). For the potential game, we have the following observation: Observation 10. For any bimatrix potential game (A, B), there is a coordination game (P, P) such that A -P, B -P are trivial matrices. P is the matrix representation of the potential function P. This observation immediately implies that the volume-changing behavior of a potential game is equivalent to that of a corresponding coordination game. We give a concrete example to show how these tools help us to analyze the C (A,B) (•). A Simple Example. We will show how to use our tools to demonstrate C(•) ≥ 0 everywhere for the following game. In the example, each player has three strategies. The payoff bimatrix (A, B) is given below. The first number gives the payoff of the row player, who chooses a strategy from {a, b, c}; the second number gives the payoff of the column player, who chooses a strategy from {1, 2, 3}. We first use our first tool to decompose this game into zero-sum part (Z, -Z) and  + 0 2 -2 -2 0 2 2 -2 0 , where the first matrix on the RHS is a trivial matrix (using notations in Definition 7, u = v T = [2, 0, 2]). It's easy to see the second matrix on the RHS is 1 4 Z. Then by Lemmas 6 and 8, and the definition of the function C, for any point (p, q) in the cumulative payoff space, C (A,B) (p, q) = C (Z,-Z) (p, q) + C ( 1 4 Z, 1 4 Z) (p, q) = 1 -(1/4) 2 • C (Z,-Z) (p, q) ≥ 0.

3.2. RESULTS FOR BIMATRIX GAMES

In this subsection, we identify several characterizations for general bimatrix games in which we have chaotic behavior with MWU dynamic in a following set S δ in the cumulative payoff space R n1+n2 . Note that when δ is tiny, its strategy correspondence covers almost the entirety of the strategy simplex ∆ n1 × ∆ n2 , thus we informally say that if the dynamical system is Lyapunov chaotic in S δ for a tiny δ, then it is Lyapunov chaotic almost everywhere. S δ = {(p, q) |∀j ∈ S 1 , k ∈ S 2 , x j (p) ≥ δ ∧ y k (q) ≥ δ } . In order to show chaotic behavior of MWU in a specific bimatrix game (A, B), it is sufficient to show C (A,B) (p, q) is strictly positive in the region S δ , due to Lemma 4. In the previous subsection, we show that for each game (A, B), it can be decomposed into a zero-sum part (Z, -Z) and a coordination part (C, C). Furthermore, C (A,B) (p, q) = C (Z,-Z) (p, q) + C (C,C) (p, q). We also raise an intuition that if the zero-sum part is small, then the volume behavior in the game (A, B) will be similar that in the coordination part; conversely, if the coordination part is small, then the volume behavior will be similar to the zero-sum part. However, we have not yet presented a way to compare the largeness of the two parts. This is what we do here.

3.2.1. FIRST CHARACTERIZATION: MATRIX DOMINATION

The first characterization we identify is matrix domination. In this part, we show that under certain conditions, the zero-sum part is always no less than the coordination part, i.e. C (Z,-Z) (p, q) ≥ -C (C,C) (p, q) for all (p, q). This directly implies C (A,B) (p, q) will be non-negative in the whole cumulative payoff space. Interestingly, the condition we identify is both necessary and sufficient. Similar result can also be achieved in the case that coordination part is always no less than the zero-sum part. We first introduce the definition of the matrix domination. Definition 11. We say matrix K dominates matrix L if they are of the same dimension, and for any row indices j, j and column indices k, k , |K jk + K j k -K jk -K j k | ≥ |L jk + L j k -L jk -L j k | . Note that the domination induces a partial order on all matrices: if K dominates L and L dominates M, then K dominates M. The theorem below gives the necessary and sufficient condition. Theorem 12. C (A,B) (p, q) is non-negative for all (p, q) if and only if matrix of the zero-sum part Z dominates the matrix of the coordination part C. The above theorem is based on the following crucial observation. Observation 13. For any matrix Z, C (Z,-Z) (p, q) = 1 4 j,j ∈S1 k,k ∈S2 x j (p) • y k (q) • x j (p) • y k (q) • (Z jk + Z j k -Z jk -Z j k ) 2 . Matrix domination only implies C (A,B) (p, q) is non-negative. In order to have C (A,B) (p, q) to be strictly positive in the set S, we need θ-domination. Definition 14. We say matrix K θ-dominates (θ > 0) matrix L if K dominates L, and there exist j, j , k, k such that |K jk + K j k -K jk -K j k | ≥ |L jk + L j k -L jk -L j k | + θ. The following theorem holds due to Lemma 4. Theorem 15. For any general bimatrix game (A, B) which is decomposed into zero-sum part (Z, -Z) and coordination part (C, C), if Z θ-dominates C, then MWU with any sufficiently small step-size in the game (A, B) is Lyapunov chaotic in S δ with Lyapunov exponent θ 2 δ 4 2(n1+n2) 2 . Note that in Definition 14, K θ-dominates L if a finite number of inequalities are satisfied. In the context of Theorem 15, it is easy to see that there are quite many games (A, B), such that Z θdominates C with all those inequalities strictly satisfied. Thus, there exists an open neighbourhood around these games such that every game in the neighbourhood has its zero-sum part θ-dominates its coordination part. This shows that such family of games has positive Lebesgue measure.

3.2.2. SECOND CHARACTERIZATION: LINEAR PROGRAM

Note that matrix domination is not always true. In some scenarios, the zero-sum matrix might not dominate the coordination matrix. Yet, it is still possible that C (A,B) (p, q) is strictly positive in the region S δ , when every entry in the coordination matrix is small. Precisely, for a general bimatrix game (A, B), if its coordination part (C, C) is small in the sense that the absolute values of all entries in C are smaller than some constant r, then we can bound C (C,-C) (•) by O(r 2 ). This is not the only case where we can bound C (C,-C) (•) by a small term. Even if the entries in matrix C are large, we can use trivial matrices to reduce them without affecting C (C,-C) (•). This is done via a linear programming approach described below. Given a matrix K, let r(K) be the optimal value of following linear program: min r≥0,g,h r such that ∀j, k, -r ≤ K jk -g j -h k ≤ r. (7) Note that {g j + h k } j,k is a trivial matrix. Let K = K -{g j + h k } j,k . By Lemma 8, C (K,-K) (•) = C (K ,-K ) (•). The following lemma shows that the value of C (K,-K) (•) is closely related to r(K). Lemma 16. For any (p, q) in S δ = {(p, q) |∀j, k, x j (p) ≥ δ and y k (q) ≥ δ }, (r(K) • δ) 2 ≤ C (K,-K) (p, q) ≤ r(K) 2 . Then the theorem below follows by applying Lemma 16 with Lemma 4. Theorem 17. For any general bimatrix game (A, B) which is decomposed into zero-sum part (Z, -Z) and coordination part (C, C), if there exists θ > 0 such that (r(Z) • δ) 2 ≥ (r(C)) 2 + (θδ 2 ) 2 , then MWU with any sufficiently small step-size in the game (A, B) is Lyapunov chaotic in S δ with Lyapunov exponent θ 2 δ 4 2(n1+n2) 2 . Intuitively, r(Z) is a distance measure from the zero-sum game (Z, -Z) to the trivial game space; analogously, r(C) is a distance measure from the coordination game (C, C) to the trivial game space. Theorem 17 shows that if the coordination part is much closer to the trivial game space than the zero-sum part, then MWU in this game is Lyapunov chaotic in S δ . To illuminate that volume expansion occurs when MWU is employed in game with small coordination part, we simulate MWU in the reduced payoff spacefoot_9 in a game which is the sum of zero-sum game

4. EXPERIMENT

[ 1 0 0 1 ] , -1 0 0 -1 and coordination game -0.05 0.03 0.03 -0.05 , -0.05 0.03 0.03 -0.05 . In the strategy space, x * = y * = (0.5, 0.5) is the unique Nash equilibrium of the game. In the reduced dual space, the origin corresponds to the equilibrium. We pick a square of side length 0.004 around the origin as the set of starting points (the small red square in the middle of Figure 1 ). As these starting points are evolved via MWU with step-size 0.02, we take snapshots after every 1900 time steps, which are shown with colors blue, pink, lime, purple, orange and green (then the colors repeat) respectively. As shown in the figure, the volume (which is area in two-dimensional space) increases, and its shape changes from a square to a parallelogram.

5. CONCLUSION AND FUTURE WORKS

In this paper, we analyze the volume-changing behavior of several well-known learning algorithms (MWU, OMWU, FTRL) on general bimatrix games and multi-player games, which leads to a Lyapunov chaos analysis. For bimatrix games, we do this by decomposing a game into the zero-sum part and the coordination part. This decomposition turns the volume analysis into comparing the strengths of volume-expansion (zero-sum part) and volume-contraction (coordination part) of the MWU dynamics. The comparison of strengths is made via the notion of matrix domination and the use of a linear program. For multi-player games, by the local equivalence, we show that the volume-changing behavior of MWU and OMWU are opposite to each other even in multi-player games. We also show, for a general multi-player potential game, the key function C G is equal to a corresponding multi-player coordination game, which implies it is not positive. Studying learning in matrix (normal-form) games, which are among one of the most classical game models, is a good theoretical starting point. Matrix games admit mathematically amenable analyses, as demonstrated in our work and many previous works. For future works, we are immensely interested in chaos analyses on settings that are more relevant to applications in ML, e.g. general GANs and differential games. We believe the techniques we use (volume analysis, game decomposition, etc.) can be applicable.

A FURTHER RELATED WORK

In the study of no-regret learning (e.g. Littlestone & Warmuth (1994) ; Freund & Schapire (1995) ), a vast literature concerns general or even adversarial settings, in which the online arrivals of payoff values come with no pattern or even from an adversary. More recently, settings where the online payoffs are more well-behaved, under the term of "predictable sequence" coined by Rakhlin & Sridharan (2013b) , have been studied. These settings include game dynamics, as the online payoffs are determined by the mixed strategy choices of the players, while these choices are updated gradually and somewhat predictably. For these settings, online learning algorithms that perform particularly well, e.g. achieving regret bound below the canonical O( √ T ) limit, are designed and studied (Hazan & Kale (2010) ; Chiang et al. (2012) ; Syrgkanis et al. (2015) ). For instance, Nesterov's excessive gap technique and optimistic mirror descent are found to achieve near-optimal regret O(log T ) in zerosum games (Daskalakis et al. (2015) ; Rakhlin & Sridharan (2013a) ), and thus the empirical average of the learning sequence converges to Nash equilibrium of the game (see Freund & Schapire (1996) for an explanation). OMWU (with time-varying step-sizes), and more generally optimistic variant of FTRL (Rakhlin & Sridharan (2013b) ), are some canonical examples of such online learning algorithms. Recently, there is a stream of work that examines how learning algorithms behave in games or min-max optimization from a dynamical-systemic perspective. Replicator dynamics (RD; the continuous-time analogue of MWU) and continuous-time FTRL are found to achieve optimal regret in general settings (Mertikopoulos et al. (2018) ). Furthermore, RD in zero-sum games or graphical constant-sum games admits a constant of motion and preserves volume; these two properties are used to show that such dynamical systems are near-periodic (Piliouras & Shamma (2014) 2020)), captured rigorously under the notion of Poincaré recurrence (Poincaré (1890) ; Barreira (2006) ). However, when MWU, the forward Euler discretization of RD, is used in discrete-time setting in zero-sum games, the near-periodicity is destroyed totally; indeed, the system will never visit the same point (or its tiny neighbourhood) twice, converge to the boundary of the strategy simplex, and fluctuate there irregularly (Bailey & Piliouras (2018) ; Cheung ( 2018)). In contrast, (discrete-time) OMWU in zero-sum game is shown to converge to Nash equilibrium (Daskalakis & Panageas (2019) ); yet, in the more general setting of min-max optimization, it was found that Optimistic Gradient Descent Ascent (OGDA) can have limit points other than (local) min-max solutions (Daskalakis & Panageas (2018) ).

B PROOFS IN SECTION 3

Proof of Lemma 8. First, observe that it suffices to prove that the lemma holds when T 1 is a trivial matrix and T 2 is the zero matrix. Then the lemma holds for any trivial matrices T 1 , T 2 due to symmetry: C (A,B) (p, q) = C (A+T 1 ,B) (p, q) = C (A+T 1 ,B+T 2 ) (p, q). Due to the definition of trivial matrix, we can write T 1 jk = u j + v k . Then C (A+T 1 ,B) (p, q) -C (A,B) (p, q) = -E A jk + u j + v k -A j -u j - ∈S2 v y -A k -v k - ∈S1 u x (B jk -B j -B k ) + E [A jk + u j + v k ] • E [B jk ] + E [(A jk -A j -A k )(B jk -B j -B k )] -E [A jk ] • E [B jk ] = -E - ∈S2 v y - ∈S1 u x (B jk -B j -B k ) + E [u j + v k ] • E [B jk ] = E [v k + u j ] • E [B jk -B j -B k ] + E [u j + v k ] • E [B jk ] . By recalling that E [B jk -B j -B k ] = -E [B jk ], we have C (A+T 1 ,B) (p, q) -C (A,B) (p, q) = 0. Proof of Observation 10. Let P jk be the potential value of a potential game when Player 1 plays strategy j and Player 2 plays strategy k. Then according to the definition the potential function, for any j 1 , j 2 and k, A j1k -A j2k = P j1k -P j2k . In particular, for any j, k, A jk = P jk + A 1k -P 1k . This implies that there exists v such that A jk = P jk + v k for any j and k. Similarly, there exists u such that B jk = P jk + u j for any j and k. This implies that any two-player potential games are coordination games plus trivial matrices. Proof of Theorem 12. We first prove that if Z dominates C, then C (A,B) (p, q) is always nonnegative. By Observation 13, C (A,B) (p, q) = C (Z,-Z) (p, q) + C (C,C) (p, q) = C (Z,-Z) (p, q) -C (C,-C) (p, q) = 1 4 j,j ,k,k x j (p)y k (q)x j (p)y k (q)• (Z jk + Z j k -Z jk -Z j k ) 2 -(C jk + C j k -C jk -C j k ) 2 ≥ 0. In contrast, if Z does not dominate C, then there exist ĵ, ĵ , k, k and δ > 0 such that C ĵk + C ĵ k -C ĵk -C ĵ k 2 ≥ Z ĵk + Z ĵ k -Z ĵk -Z ĵ k 2 + δ. For each η > 0, we construct p and q such that x ĵ (p) = x ĵ (p) = y k(q) = y k (q) = 1-η 2 . Furthermore, we let Υ denote the maximum absolute value of all entries in matrices A and B. Then, for all j and k, |Z jk | ≤ Υ and |C jk | ≤ Υ. Therefore, C (A,B) (p, q) = C (Z,-Z) (p, q) + C (C,C) (p, q) = C (Z,-Z) (p, q) -C (C,-C) (p, q) = 1 4 j,j ,k,k x j (p)y k (q)x j (p)y k (q)• (Z jk + Z j k -Z jk -Z j k ) 2 -(C jk + C j k -C jk -C j k ) 2 ≤ -δ 1 -η 2 4 + |S 1 | 2 • |S 2 | 2 • η • 16Υ 2 . The last inequality holds as (C jk + C j k -C jk -C j k ) 2 -(Z jk + Z j k -Z jk -Z j k ) 2 ≤ 16Υ 2 . The value of -δ 1-η 2 4 + |S 1 | 2 • |S 2 | 2 • η • 16Υ 2 will be negative if we pick a small enough η. Proof of Observation 13. Consider a random process, where j, j ∈ S 1 are randomly picked according to distribution x(p), and k, k ∈ S 2 are randomly picked according to distribution y(q). Then the RHS of Observation 13 can be expressed as 1 4 • E (Z jk + Z j k -Z jk -Z j k ) 2 . Then we expand the squared term in the expectation. Observing the symmetries within the expansion, we immediately have 1 4 • E (Z jk + Z j k -Z jk -Z j k ) 2 = E (Z jk ) 2 -E [Z jk Z jk ] -E [Z jk Z j k ] + E [Z jk Z j k ] . Let Z j := [Zy] j and Z k = [Z T x] k . Then we have E [Z jk Z jk ] = j,k x j y k Z jk k y k Z jk = j x j [Zy] j k y k Z jk = j x j (Z j ) 2 = E (Z j ) 2 . Similarly, E [Z jk Z j k ] = E (Z k ) 2 . Lastly, E [Z jk Z j k ] = E [Z jk ] 2 . Thus, the RHS of Observation 13 is simplified to E (Z jk ) 2 -E (Z j ) 2 -E (Z k ) 2 + E [Z jk ] 2 . We complete the proof by noting that from the definition of C (Z,-Z) (•) in Eqn. (4), C (Z,-Z) (•) can be rewritten as E (Z jk ) 2 -E [Z j Z jk ] -E [Z k Z jk ] + E [Z jk ] 2 , while E [Z j Z jk ] = j x j Z j k y k Z jk = j x j Z j Z j = E (Z j ) 2 , and similarly E [Z j Z jk ] = E (Z k ) 2 . Proof of Theorem 15. By Lemma 4, it suffices to prove c(A,B) (S δ ) = inf (p,q)∈S δ C (A,B) (p, q) ≥ θ 2 δ 4 . This holds because the matrix Z θ-dominates C, which implies there exist j, j , k, and k such that (Z jk + Z j k -Z jk -Z j k ) 2 ≥ (C jk + C j k -C jk -C j k ) 2 + θ 2 . By applying Observation 13, C (Z,-Z) (p, q) ≥ C (C,-C) (p, q) + θ 2 δ 4 , because for (p, q) ∈ S δ , every x j (p), y k (q), x j (p), y k (q) is at least δ. By noting that C (C,C) (p, q) = -C (C,-C) (p, q) and C (A,B) (p, q) = C (Z,-Z) (p, q) + C (C,C) (p, q) , the result follows. Proof of Lemma 16. A key observation is the following equality: C (K,-K) (p, q) = min g,h F (g, h), where F (g, h) = j,k x j (p) • y k (q) • (K jk -g j -h k ) 2 . (8) Recall the notations K j = k y k K jk and K k = j x j K jk , and we let e := j,k x j y k K jk ≡ E [K jk ]. The equality (8) holds due to the following observations: (i) F (g, h) is a smooth convex function of its variables, thus all minimum points have the same function value; (ii) if ∂F ∂gj and ∂F ∂h k are all zero at some point (g, h), then the point is a minimal point of F ; (iii) C (K,-K) (p, q) is the variance of the random variable K jk -K j -K k (see the end of Section 2), and thus by a definition of variance, C (K,-K) (p, q) = E (K jk -K j -K k -E [K jk -K j -K k ]) 2 = E (K jk -K j -K k + e) 2 (since E [K jk -K j -K k ] = -E [K jk ] = -e) = j,k x j y k (K jk -(K j -e) -K k ) 2 = F (g # , h # ), where g # j = K j -e and h # k = K k ; and (iv) at (g # , h # ), the partial derivatives stated in (ii) are all zero. With this observation and comparing this with the definition of r(Z), it's easy to figure out that C (K,-K) (p, q) ≤ r(K) 2 . To see C (K,-K) (p, q) ≥ (r(K)) • δ) 2 , we first let g * and h * to be the optimal choice of g and h in C (K,-K) (p, q) = min g,h j,k x j (p) • y k (q) • (K jk -g j -h k ) 2 . Due to the specification of the linear program (7), we have 11 2 • r(K) ≤ max j,k {K jk -g * j -h * k } -min j,k {K jk -g * j -h * k }. Therefore, max max j,k {K jk -g * j -h * k } 2 , min j,k {K jk -g * j -h * k } 2 ≥ r(K) 2 . This immediately implies that C (K,-K) (p, q) ≥ (r(K) • δ) 2 . 11 If this is not true, we can let g and h in r(Z) to be gj = g * j - max j,k {K jk -g * j -h * k }+min j,k {K jk -g * j -h * k } 2 and h k = h * k . Then we can achieve r(K) = max j,k {K jk -g * j -h * k }-min j,k {K jk -g * j -h * k } 2 which make r(K) smaller. which are gradual (i.e. those governed by a small step-size), followed by a continuous-time analogue of OMWU in games, which are crucial for analyzing the volume change of discrete-time OMWU. Then we compute the volume changes of MWU and OMWU in multi-player graphical games and normal-form games respectively. Once these are done, the proofs of Proposition 20 and Theorem 17 become apparent.

D.1 DISCRETE-TIME DYNAMICAL SYSTEMS AND VOLUME OF FLOW

We consider discrete-time dynamical systems in R d . Such a dynamical system is determined recursively by a starting point s(0) ∈ R d and an update rule of the form s(t + 1) = G(s(t)), for some function G : R d → R d . Here, we focus on the special case when the update rule is gradual, i.e. it is in the form of s(t + 1) = s(t) + • F (s(t)), where F : R d → R d is a smooth function and step-size > 0. When F and are given, the flow of the starting point s(0) at time t, denoted by Φ(t, s(0)), is simply the point s(t) generated by the above recursive update rule. Then the flow of a set S ⊂ R d at time t, denoted by Φ(t, S), is the set {Φ(t, s) | s ∈ S}. Since F does not depend on time t, we have the following equality: Φ(t 1 + t 2 , S) = Φ(t 2 , Φ(t 1 , S)). By equipping R d with the standard Lebesgue measure, the volume of a measurable set S, denoted by vol(S), is simply its measure. Given a bounded and measurable set S ⊂ R d , if the discrete flow in one time step maps S to S = Φ(1, S) injectively, then by integration by substitution for multivariables, vol(S ) = s∈S det (I + • J(s)) dV, where I is the identity matrix, and J(s) is the Jacobian matrix defined below: J(s) =      ∂ ∂s1 F 1 (s) ∂ ∂s2 F 1 (s) • • • ∂ ∂s d F 1 (s) ∂ ∂s1 F 2 (s) ∂ ∂s2 F 2 (s) • • • ∂ ∂s d F 2 (s) . . . . . . . . . . . . ∂ ∂s1 F d (s) ∂ ∂s2 F d (s) • • • ∂ ∂s d F d (s)      . ( ) Clearly, analyzing the determinant in the integrand in Eqn. ( 9) is crucial in volume analysis; we call it the volume integrand. When the determinant is expanded using the Leibniz formula, it becomes a polynomial of , in the form of 1 + C(s) • h + O( h+1 ) for some integer h ≥ 1. Thus, when the step-size is sufficiently small, the sign of C(s) dictates on whether the volume expands or contracts.

D.2 CONTINUOUS-TIME ANALOGUE OF OMWU

OMWU does not fall into the category of dynamical systems defined above, since its update rule is in the form of s(t + 1) = G(s(t), s(t -1)). Fortunately, Cheung and Piliouras Cheung & Piliouras (2020) showed that OMWU can be well-approximated by the online Euler discretization of a system of ordinary differential equations (ODE), and thus it can be well-approximated by a dynamical system. The ODE system is given below. p is a dual (cumulative payoff) vector variable, u : R + → R d is the function such that u(t) gives the instantaneous payoff vector at time t. We assume that u is twice differentiable with bounded second-derivatives, and u denotes the time-derivative of u. ṗ = u + • u, Online Euler discretization (OED) of Eqn. (11) refers to the following time-discretization of the ODE system. In applications, u might not be explicitly given, and the sequence u(0), u(1), u(2), • • • are available online (i.e., at time t we only have access of u(τ ) for τ = 0, 1, • • • , t). As the discretization step is , we approximate u(t) by (u(t) -u(t -1))/ . By using this approximation, OED of Eqn. (11) yields p(t + 1) = p(t) + • u(t) + • u(t) -u(t -1) = p(t) + • [2 • u(t) -u(t -1)] , which is exactly the OMWU update rule in general context. When compared the OED with the standard Euler discretization p(t + 1) = p(t) + • [u(t) + • u(t)] , OED incurs a local error that appears due to the approximation of u(t). The local error can be bounded by O( 3 ). Cheung and Piliouras Cheung & Piliouras (2020) showed that eventually the determinant of the volume integrand is a of the form 1 + C(s) • 2 + O( 3 ), the local error does not affect the first and second highest-order terms, and hence can be ignored henceforth.

D.3 MWU IN GRAPHICAL GAMES

Let H be a graphical game of N players, where between every pair of Players i and k, the payoff bimatrices are (H ik , (H ki ) T ). In the cumulative payoff space, let p = (p 1 , • • • , p N ) denote the cumulative payoff profile, and let x = (x 1 , • • • , x N ) denote the corresponding mixed strategy profile, where x i is a function of p i . We will write x i and x i (p i ) interchangeably. The expected payoff to strategy j of Player i is u ij (p) = k∈[N ] k =i [H ik • x k (p k )] j , which will be used to compute the Jacobian matrices of MWU and OMWU. For MWU, the Jacobian matrix J is a squared matrix with each row and each column indexed by (i, j), where i is a Player and j ∈ S i . The precise values of its entries are given below: ∀j 1 , j 2 ∈ S i , J (i,j1),(i,j2) = • ∂u ij1 ∂p ij2 = 0 and ∀i = k, j ∈ S i , ∈ S k , J (i,j),(k, ) = • ∂u ij p k = x k • H ik j -[H ik • x k ] j . Then by expansion using Leibniz formula, the determinant of (I + • J) is 1 - i∈[N ] j∈Si k>i ∈S k ( J (i,j),(k, ) )( J (k, ),(i,j) ) + O( 3 ) = 1 -2 • i∈[N ] j∈Si k>i ∈S k x ij x k H ki j -[H ki • x i ] H ik j -[H ik • x k ] j + O( 3 ). By noting the similarity of the double summation to C (A,B) (•) in Eqn. (4), we can immediately rewrite the above expression as 1 + 2 • i,k:1≤i<k≤N C (H ik ,(H ki ) T ) (p i , p k ) + O( 3 ).

D.4 OMWU IN GRAPHICAL GAMES

For OMWU, as we pointed out already, we will first consider its continuous analogue first. Thus, we need to compute u in the continuous-time setting. By chain rule, we have uij (p) = k∈[N ] k =i ∈S k ∂[H ik • x k (p k )] j ∂p k • dp k dt = k∈[N ] k =i ∈S k x k • H ik j -[H ik • x k ] j • dp k dt , and hence dp ij dt = k∈[N ] k =i [H ik • x k ] j + • k∈[N ] k =i ∈S k x k • H ik j -[H ik • x k ] j • dp k dt . Note that this is a recurrence formulae for dp dt . By iterating itfoot_10 , we have dp ij dt = k∈[N ] k =i [H ik • x k ] j + • k∈[N ] k =i ∈S k x k • H ik j -[H ik • x k ] j •     r∈[N ] r =k [H kr • x r ]     + O( 2 ). Hence, its standard Euler discretization, which approximates the OED with local error O( 3 ), can be written as below (where we ignore the O( 3 ) error terms): p ij (t+1) = p ij (t) + k∈[N ] k =i [H ik •x k ] j + 2 k∈[N ] k =i ∈S k x k • H ik j -[H ik • x k ] j •     r∈[N ] r =k [H kr • x r ]     . With this, we are ready to compute the Jacobian matrix J for OMWU. For all j 1 , j 2 ∈ S i , J (i,j1),(i,j2) = 2 k∈[N ] k =i ∈S k x k • H ik j1 -[H ik • x k ] j1 • x ij2 • H ki j2 -[H ki • x i ] and for all i = k, j ∈ S i , ∈ S k , J (i,j),(k, ) = x k H ik j -[H ik • x k ] j + O( 2 ) Then by expansion using Leibniz formula, the determinant of (I + • J) is 1 +          i∈[N ] j∈Si J (i,j),(i,j) T1 - i∈[N ] j∈Si k>i ∈S k ( J (i,j),(k, ) )( J (k, ),(i,j) ) T2          + O( 3 ). By a direct expansions on T 1 and T 2 , it is easy to see that T 1 = 2T 2 (after ignoring O( 3 ) terms). On the other hand, the coefficient of 2 in T 2 is exactly the same as the double summation in Eqn. ( 14), thus it equals toi,k:1≤i<k≤N C (H ik ,(H ki ) T ) (p i , p k ). Overall, we show that the determinant equals to 1 -2 • i,k:1≤i<k≤N C (H ik ,(H ki ) T ) (p i , p k ) + O( 3 ). Observation 22. The coefficient of 2 in Eqn. ( 18) is the exact negation of the coefficient of 2 in Eqn. (15).

D.5 COMPLETING THE LOCAL EQUIVALENCE PROOF

In a multiplayer normal-form game G, recall that notation Eqn. (1). We point out the following formulae: ∂U i1i2•••ig j1j2•••jg ∂p ij = 0 if i ∈ {i 1 , i 2 , • • • , i g }; ∂U i1i2•••ig j1j2•••jg ∂p ij = x ij • U i1i2•••igi j1j2•••jgj -U i1i2•••ig j1j2•••jg if i / ∈ {i 1 , i 2 , • • • , i g }. MWU. Here, MWU update rule is p ij (t + 1) = p ij (t) + • U i j . When computing the Jacobian matrix for this update rule using the formulae above, and comparing it with the Jacobian matrix computed in Eqn. ( 12) and Eqn. (13), it is immediate that they are the same by setting H ik j = U ik j . This derives Eqn. (5), and completes the proof of Theorem 19. OMWU. As before, we use the continuous analogue and compute u. By the chain rule and the above formulae, we have uij (p) = k∈[N ] k =i ∈S k ∂U i j ∂p k • dp k dt = k∈[N ] k =i ∈S k x k • U ik j -U i j • dp k dt and hence dp ij dt = U i j + • k∈[N ] k =i ∈S k x k • U ik j -U i j • dp k dt . Iterating the above recurrence yields dp ij dt = U i j + • k∈[N ] k =i ∈S k x k • U ik j -U i j • U k + O( 2 ). Its standard Euler discretization is p ij (t + 1) = p ij (t) + • U i j + 2 • k∈[N ] k =i ∈S k x k • U ik j -U i j • U k . Now we compute the Jacobian matrix for this standard Euler discretization. For j 1 , j 2 ∈ S i , J (i,j1),(i,j2) = 2 k∈[N ] k =i ∈S k x k • U ik j1 -U i j1 • x ij2 • U ki j2 -U k and for all i = k, j ∈ S i , ∈ S k , J (i,j),(k, ) = x k U ik j -U i j + O( 2 ) . By comparing this computed Jacobian matrix with the Jacobian matrix computed in Eqn. ( 16) and Eqn. ( 17), it is immediate to see that their determinants are the same (after ignoring all O( 3 ) terms) by setting H ik j = U ik j . With the result we just derived, together with Observation 22 and Theorem 19, Proposition 20 follows.

E MULTI-PLAYER POTENTIAL GAME

Proof of Proposition 21. We know that the potential game satisfies the following condition: P(s i , s -i ) -P(s i , s -i ) = u i (s i , s -i ) -u i (s i , s -i ). Therefore, u i (s i , s -i ) = P(s i , s -i )+v i (s -i ). Note that v i (s -i ) does not depend on s i , the strategy of player i. By Theorem 19, let H(U) be the induced graphical game of U and H(U P ) be the induced graphical game of U P . Then, C U (p) = C H(U) (p) (Theorem 19) = i,k C (H(U) ik ,(H(U) ki ) T ) (p i , p k ) (By equation 15) = i,k C (H(U P ) ik ,(H(U P ) ki ) T ) (p i , p k ) (see explanation below) = C H(U P ) (p) (By equation 15) = C U P (p). ( The third equality holds as the difference between H(U) ik and H(U P ) ik is a trivial matrix: H(U) ik jl = U ik jl = U P ik jl + E -(i,k) v i (s -i ) = H(U P ) ik jl + E -(i,k) v i (s -i ) ; where 13 E -(i,k) v i (s -i ) doesn't depend on j, the strategy of player i, and only depends on l, the strategy of player k. The same argument applies for (H(U) ki ) T and (H(U P ) ki ) T . To see C U (p) ≤ 0, observe that the induced graphical game of U P between player i and k, (H(U P ) ik , (H(U P ) ki ) T ), is also a bimatrix coordination game, which implies C (H(U P ) ik ,(H(U P ) ki ) T ) (•) ≤ 0. As C U (p) = i,k C (H(U P ) ik ,(H(U P ) ki ) T ) (p i , p k ), the result follows. Next, we identify several cases such that C U (p) is strictly negative in the region S δ = {x|∀i, j x ij > δ}. The conditions we pose are on the corresponding potential function P. Note that H(U P ) ik , the induced edge-game between player i and k, is also a coordination game, i.e. H(U P ) ik = (H(U P ) ki ) T . • Case 1: min x,g,h 1≤i<k≤N j∈Si, ∈S k P ik j -g ik j -h ik 2 ≥ θ, where P ik j = E s -(i,k) P(s i = j, s k = , s -(i,k) ) . With this condition, we can prove that C U (p) ≤ -θδ 2 for any p in S δ . One key observation for this is true is that C U (p) = i,k C (H(U P ) ik ,(H(U P ) ki ) T ) (p i , p k ) = - i,k j, x ij (p i )x k (p k ) P ik j -g ik j -h j 2 , as H(U P ) ik = U P ik = P ik . • Case 2: If U is a graphical game, then if there exists a pair of player i 1 and i 2 , such that the game between i 1 and i 2 is a non-trivial game, then C U will be strictly negative in S δ . • Case 3: Consider the payoff matrix of U P , the coordination game, between players i 1 and i 2 given a strategy profile of the other players. There are total i =i1,i2 n i such matrices, one for each strategy profile of the other players, and each matrix is of dimension n i1 × n i2 . We call these matrices the projected matrices for players i 1 , i 2 . Let M denote the matrix space of n i1 × n i2 . On the other hand, trivial matrices form a subspace of dimension n i1 + n i2 -1. 14 Let's call this the trivial space, denoted by T . We consider the direct decomposition M = T ⊕ V. Let a set of bases of M be B 1 , B 2 , B 3 , • • • , B ni 1 ni 2 , where the first n i1 + n i2 -1 bases form a basis of T , and the remaining bases form a basis of V. Without loss of generality, we assume that all bases are of L 2 norm 1. 15 13 E -(i,k) v i (s-i) is the expectation over all the strategies taken by the players other than i and k and v i (s-i) does not depend on the strategy taken by player i. 14 Recall that a trivial matrix T can be represented as {uj + v k } j,k . Consider the natural linear map L such that L(u1, u2, • • • , un i 1 , v1, v2, • • • , vn i 2 ) maps to the trivial matrix T. Note that the kernel of L is of dimension 1, since if L(u1, u2, • • • , un i 1 , v1, v2, • • • , vn i 2 ) is the zero matrix, then we must have v k = -uj for all j, k, and hence the kernel of L must be the span of the vector (1, 1, • • • , 1 the u part , -1, -1, • • • , -1 the v part ). Thus, the dimension of all trivial matrices is the dimension of the domain of L, which is ni 1 + ni 2 , minus the dimension of the kernel of L. 15 Here, the norm is defined w.r.t. the standard Frobenius matrix inner product. Given the above-mentioned bases of M, each of the projected matrices can be written into a unique linear combination of these bases. Now, suppose there is a base B for l ≥ n i1 +n i2 (i.e. this base is in the set of bases for V), such that all projected matrices have non-positive (or non-negative) coefficients of this base, and at least one of these projected matrices (which we call a special projected matrix) has strictly negative (or strictly positive) coefficient of the base. Then we claim that C (H(U P ) i 1 i 2 ,(H(U P ) i 2 i 1 ) T ) (p i1 , p i2 ) will be strictly negative in S δ . This is because H(U P ) i1i2 is a convex combination of all those projected matrices, and by our assumption above, when H(U P ) i1i2 is expressed as the linear combinations of the bases of M, the coefficient of B is strictly negative (or strictly positive), thus H(U P ) i1i2 cannot be a trivial matrix. Suppose further that there exists θ > 0 such that a special projected matrix has negative (or positive) coefficient for B which is smaller (or bigger) than -θ (or θ), then we are guaranteed that H(U P ) i1i2 is bounded away from T for a distance of θδ N -2 ,foot_11 and hence as the calculations below show, C (H(U P ) i 1 i 2 ,(H(U P ) i 2 i 1 ) T ) (p i1 , p i2 ) ≤ -θ 2 δ 2N -2 . If there exists a pair of player i 1 and i 2 such that this condition holds, then C U ≤ -θ 2 δ 2N -2 . C (H(U P ) i 1 i 2 ,(H(U P ) i 2 i 1 ) T ) (p i1 , p i2 ) = -min g,h jl x i1j (p i1 )x i2,l (p i2 )(H(U P ) i1i2 -g j -h k ) 2

≤ -min

g,h δ 2 jl (H(U P ) i1i2 -g j -h k ) 2 = -δ 2 jl (H(U P ) i1i2 -g * j -h * k ) 2 ≤ -δ 2 (θδ N -2 ) 2 , where {g * j + h * k } jk is projection of H(U P ) i1i2 on the trivial space. The first inequality follows as p ∈ S δ ; the second equality holds as the projection minimizing the distance to the trivial space, and the final inequality comes from the distance from H(U P ) i1i2 to the trivial space. For all these cases, we can have OMWU is C U to be strictly negative in domain S δ , which implies OMWU is Lyapunov chaotic in S δ .



In games, such perturbations can also occur due to errors in measuring payoffs. For instance, in learning algorithms we often evaluate the function e x , while e is an irrational number. They also showed some other chaos results for various combinations of learning algorithms and games. The families of zero-sum and coordination games are proper subspaces of the bimatrix game space. Any proper subspace has Lebesgue measure zero. The theorem and propositions mentioned in this paragraph will be stated formally in Appendix C. The probability simplex over n strategies is (n -1)-dimensional. Rigorously, OMWU in game is not a dynamical system, as the update to p(t + 1) depends on both p(t), p(t -1). But there is a function f such that p(t + 1) ≈ f (p(t)), while the volume-changing behavior is not really affected (Cheung & Piliouras (2020)). We note that many linear dynamical systems admits simple closed-form solutions (e.g. dx/dt = x has solution x(t) = x(0) • e t ), but they are considered Lyapunov chaotic under this definition. This might not match with the intuitive meaning of "chaos" to many people. However, for non-linear dynamical systems which mostly do not admit closed-form solutions (including all systems we study), Lyapunov chaos is well-received as a notion that captures unpredictability. They showed that, in the cumulative payoff space, when the set S t is evolved to S t+1 after one MWU step, then volume(S t+1 ) = volume(S t ) + 2 (p,q)∈S t C (A,B) (p, q) dp dq + O( 4 ). Thus, if C (A,B) (p, q) > 0, then the volume is increasing, which indicates diverging trajectories, i.e. chaos. Reduced payoff space is a projection of the original payoff space. In the reduced payoff space, the horizonal axis is p1 -p2, while the vertical axis is q1 -q2. The original cumulative payoff space is four-dimensional, for which it is difficult to give graphical illumination. This is why we use the reduced space instead. For the formality on why we can do iterations when is sufficiently small, seeCheung & Piliouras (2020). To see why, when the coefficient for B is bounded away from zero, we are guaranteed that the special projected matrix has a strictly positive distance from T , and this distance is at least θ. Then H(U P ) i 1 i 2 , which is a convex combination of all projected matrices where each projected matrix (in particular, the special projected matrix) has a weight at least δ N -2 , has a strictly positive distance from T too, which is at least θδ N -2 .



At this point, we still cannot easily figure out which one is larger between C (Z,-Z) (•) and C (C,-C) (•). However, we can further decompose the coordination part by the second tool: C =

Figure 1: Volume expansion of MWU in the bimatrix game [ 0.95 0.03 0.03 0.95 ] , -1.05 0.03 0.03 -1.05

; Mertikopoulos et al. (2018); Boone & Piliouras (2019); Vlatakis-Gkaragkounis et al. (2019); Perolat et al. (

ACKNOWLEDGMENTS

We thank several anonymous reviewers for their suggestions, which help to improve the readability of this paper from its earlier version. Yixin Tao acknowledges NSF grant CCF-1527568, CCF-1909538 and ERC Starting Grant ScaleOpt-757481. Yun Kuen Cheung acknowledges Singapore NRF 2018 Fellowship NRF-NRFF2018-07.

C MULTI-PLAYER GAMES

Computing volume change of learning algorithm in multi-player game is slightly more involved than the two-player case. We present a local equivalence formula of volume change between normal-form and graphical games. This provides an intuitive procedure for understanding volume changes. Proposition 20 shows that in multi-player game, the volume-changing behaviors of MWU and OMWU are again opposite to each other (which was shown for bimatrix game in Cheung & Piliouras (2020)).Graphical Games. A graphical game Kearns et al. ( 2001) is a special type of N -player game where the payoffs can be compactly represented. In a graphical game H, for each pair of players i, k, there is an edge-game which is a bimatrix game between the two players, denoted by (H i,k , (H k,i ) T ), where H i,k ∈ R ni×n k is the payoff matrix that denotes the payoffs to Player i. Then the payoff to Player i at strategy profile s = (s 1 , s 2 , • • • , s N ) is the sum of payoffs to Player i in all her edgegames, i.e. u i (s) = k =i H i,k si,s k . As is standard, this payoff function is extended via expectation when the inputs are mixed strategies.Here, we first use an observation from Cheung & Piliouras (2019) to construct a family of multiplayer graphical games where MWU is Lyapunov chaotic in S N,δ :all pairs of Players i < k. This observation yields Theorem 18.Theorem 18. Let G ↑ denote the family of bimatrix games which satisfy the condition either in Theorem 15 or in Theorem 17. In an N -player graphical game where each edge-game is drawn from G ↑ , if all players are employing MWU with a sufficiently small step-size , then the dynamical system is Lyapunov chaotic in S N,δ with Lyapunov exponentLocal Equivalence of General Games and Graphical Games. Next, we present a theorem which connects the value of C G (p) of a general game to C H (p), where H is a graphical game. Theorem 19. Given an N -player normal-form game G and any point p in the cumulative payoff space, the value of C G (p) is the same as C H (p), where H is a graphical game specified as follows: for each pair of Players i, k and j ∈ S i , ∈ S k , the payoff to Player i in her edge-game with Player k when Player i picks j and Player k picks is H ik j := U ik j , where U ik j is defined in Eqn. (1). This theorem shows that for any game G, the value of C G (p) is the same as in a particular graphical game, where each pair of players, (i, k) play a bimatrix game whose utility is exactly the utility of the original game G, but taking the expectation on the randomness of the other players' strategies. If the original game G is a graphical game, then in the graphical game H ik j = U ik j + c -i,-k , where c -i,-k is a parameter which does not depend on Players i and k.Theorem 19 will be used in Appendix D to show the following proposition, which shows that the volume-changing behaviors of MWU and OMWU are opposite to each other in multi-player game, generalizing a prior result in Cheung & Piliouras (2020) . Proposition 20. The volume integrands of MWU and OMWU in a multi-player game G are respec- 3). Thus, volume expands locally around a cumulative payoff point p for MWU (resp. OMWU) if C G (p) is positive (resp. negative).Multiplayer Potential Game. By Observation 10, we know that the volume behavior of a potential game is equivalent to a corresponding coordination game in bimatrix game. In this section, we want to show, this holds even in the multi player setting. Proposition 21. Suppose P is the potential function of a potential game U. Let U P be a game that all players will receive P(s) when players play strategies s. Then C U (p) = C U P (p) ≤ 0.In Appendix E, we will discuss some situations where C U (p) is strictly less than 0, thus OMWU is Lyapunov chaotic therein.

D LOCAL EQUIVALENCE OF VOLUME CHANGE BETWEEN NORMAL-FORM AND GRAPHICAL GAMES

Here, we concern the volume change of a learning algorithm in multi-player game. We first recap from Cheung & Piliouras (2020) on how the volume change is computed for dynamical systems

