CHAOS OF LEARNING BEYOND ZERO-SUM AND COORDINATION VIA GAME DECOMPOSITIONS

Abstract

It is of primary interest for Machine Learning to understand how agents learn and interact dynamically in competitive environments and games (e.g. GANs). But this has been a difficult task, as irregular behaviors are commonly observed in such systems. This can be explained theoretically, for instance, by the works of Cheung & Piliouras (2019; 2020), which showed that in two-person zero-sum games, if agents employ one of the most well-known learning algorithms, Multiplicative Weights Update (MWU), then Lyapunov chaos occurs everywhere in the cumulative payoff space. In this paper, we study how persistent chaos can occur in the general normal-form game settings, where the agents might have the motivation to coordinate (which is not true for zero-sum games) and the number of agents can be arbitrary. We characterize bimatrix games where MWU, its optimistic variant (OMWU) or Follow-the-Regularized-Leader (FTRL) algorithms are Lyapunov chaotic almost everywhere in the cumulative payoff space. Our characterizations are derived by extending the volume-expansion argument of Cheung & Piliouras via the canonical game decomposition into zero-sum and coordination components. Interestingly, the two components induce opposite volume-changing behaviors, so the overall behavior can be analyzed by comparing the strengths of the components against each other. The comparison is done via our new notion of "matrix domination" or via a linear program. For multi-player games, we present a local equivalence of volume change between general games and graphical games, which is used for volume and chaos analyses of MWU and OMWU in potential games.

1. INTRODUCTION

In Machine Learning (ML), it is of primary interest to understand how agents learn in competitive environments. This is more strongly propelled recently due to the success of Generative Adversarial Networks (GANs), which can be viewed as two neural-networks playing a zero-sum game. As such, Evolutionary Game Theory (EGT) (Hofbauer & Sigmund (1998) ; Sandholm (2010)), a decades-old area devoted to the study of adaptive (learning) behaviors of agents in competitive environments arising from Economics, Biology, and Physics, has drawn attention from the ML community. In contrast with the typical optimization (or no-regret) approach in ML, EGT provides a dynamicalsystemic perspective to understand ML processes, which has already provided new insights into a number of ML-related problems. This perspective is particularly helpful in studying "learning in games", where irregular behaviors are commonly observed, but the ML community currently lacks of a rigorous method to analyze such systems. In this paper, we study Lyapunov chaos, a central notion that captures instability and unpredictability in dynamical systems. We characterize general normal games where popular learning algorithms exhibit chaotic behaviors. Lyapunov chaos captures the butterfly effect: when the starting point of a dynamical system is slightly perturbed, the resulting trajectories and final outcomes diverge quickly; see Definition 1 for a formal definition. The perturbations correspond to round-off errors of numerical algorithms in ML (and Computer Science in general). 1 While significant efforts have been spent in analyzing and minimizing round-off effects of floating-point computations (Demmel (1997)), they are unavoidable in generalfoot_1 . As round-offs are inevitable, and the round-off schemes can vary from machine to machine due to various hardware and software factors, we surely want to avoid chaotic learning that does not fulfill our primary goals in building predictable and reproducible learning systems. Such issues are exemplified by a quote from Ali Rahimi's NIPS'2017 test-of-time award speech: "Someone on another team changed the default rounding mode of some Tensorflow internals from 'truncate toward zero' to 'round to even'. Our training broke, our error rate went from less than 25% error to ∼ 99.97% error." To avoid chaotic learning, we first need to understand how it can arise. This is the main motivation of our work. Recently, in the context of "learning in games", Cheung & Piliouras (2019; 2020) presented interesting theoretical analyses to show that in two-person zero-sum and graphical constant-sum games, if the agents employ Multiplicative Weights Update (MWU) or Follow-the-Regularized-Leader (FTRL) algorithms, then Lyapunov chaos occurs everywhere in the cumulative payoff space; the same result holds for the optimistic variant of MWU (OMWU) in coordination games. 3 While zero-sum and coordination games are interesting in their own rights, they are rather small subspaces within the family of general normal games. In this paper, we present techniques and tools for characterizing general games where MWU, FTRL, or OMWU are Lyapunov chaotic almost everywhere. Next, we give an overview of our contributions and a discussion of related work. Our Contributions. To show the results about chaos mentioned above, Cheung & Piliouras (2019; 2020) used a classical technique in the study of dynamical systems called volume analysis. Volume analysis considers a set of starting points of positive Lebesgue measure, e.g. a ball centred at a point; volume is an alternative name for Lebesgue measure in this context. When this set of starting points evolves according to the rule of dynamical system, it becomes a new set with a different volume. Intuitively, volume is a measure of the range of possible outcomes, so the larger it is, the more unpredictable the system is. If the set's volume increases exponentially with time, then its diameter increases exponentially too, which implies Lyapunov chaos. This indicates that when players repeatedly play the games by employing the respective learning algorithms, a slight perturbation on the initiating condition can lead to a wide range of possible cumulative payoffs in the long run. This can be shown to imply instability in the mixed strategy space. The technical starting point of Cheung & Piliouras is to show that when all agents in a bimatrix game G use MWU with step-size , and when a set S in the cumulative payoff space evolves for one time step, its volume change can be expressed as 2 S C G (s) ds + O( 4 ), where C G is a function that depends on the game G, which we will define in Section 2. Clearly, the sign of C G dictates the volume change behaviors for all small enough . Cheung & Piliouras showed that C G is a positive function when G is a two-person zero-sum game. For a large region in the cumulative payoff space, this implies the volume change per time step is Ω( 2 )•volume(S), i.e. volume expands exponentially. They also showed that C G is a negative function if G is a two-person coordination game. To extend their volume expansion results to general bimatrix games, we first discover that C G admits a clean decoupling w.r.t. the canonical decomposition of such games into zero-sum and coordination components (Basar & Ho (1974) ; Kalai & Kalai). Precisely, given any two-person general game (A, B), it can be written uniquely as a direct sum of a zero-sum game (Z, -Z) and a coordination game (C, C), where Z = (A -B)/2 and C = (A + B)/2. Interestingly, we find that C G (•) = C (Z,-Z) (•) + C (C,C) (•) in Lemma 6. Recall from the last paragraph that C (Z,-Z) (•) is always positive, while C (C,C) (•) is always negative. Thus, to see whether volume expansion occurs, it boils down to comparing the strengths of C (Z,-Z) (•) and -C (C,C) (•). We also discover that the function C G is invariant upon additions of trivial matrices to the game G; see Definition 7 and Lemma 8. An immediate application of trivial matrices is for bimatrix potential games (Monderer & Shapley (1996) ), for which we show it can be transformed to a coordination game via additions of trivial matrices. With the result in Cheung & Piliouras (2020), this immediately implies that OMWU in any bimatrix potential game is Lyapunov chaotic everywhere in the cumulative payoff space (Observation 10). Based on the above discoveries, we identify two characterizations of bimatrix games where MWU and FTRL are Lyapunov chaotic almost everywhere (Theorems 15 and 17). As said before,



In games, such perturbations can also occur due to errors in measuring payoffs. For instance, in learning algorithms we often evaluate the function e x , while e is an irrational number. They also showed some other chaos results for various combinations of learning algorithms and games.

