CONVERGENCE IS NOT ENOUGH: AVERAGE-CASE PER-FORMANCE OF NO-REGRET LEARNING DYNAMICS Anonymous

Abstract

Learning in games involves two main challenges, even in settings in which agents seek to coordinate: convergence to equilibria and selection of good equilibria. Unfortunately, solving the issue of convergence, which is the focus of state-of-the-art models, conveys little information about the quality of the equilibria that are eventually reached, often none at all. In this paper, we study a class of arbitrary-sized games in which q-replicator (QRD), a widely-studied class of no-regret learning dynamics that include gradient descent (GD), standard replicator dynamics (RD), and log-barrier dynamics as special cases, can be shown to converge pointwise to Nash equilibria. Turning to our main task, we provide both theoretical and experimental results on the average case performance of different learning dynamics in games. For example, in the case of GD, we show a tight average Price of Anarchy bound of 2, for a class of symmetric 2 × 2 potential games with unbounded Price of Anarchy (PoA). Furthermore, in the same class, we provide necessary and sufficient conditions so that GD outperforms RD in an average case analysis giving novel insights about two of the most widely applied dynamics in game theory. Finally, our experiments suggest that unbounded gaps between average case performance and PoA analysis are common, indicating a fertile area for future work.

1. INTRODUCTION

Multi-agent coordination often involves the solution of complex optimization problems. What makes these problems so hard, even when agents have common (Bard et al., 2020) or aligned interests (Dafoe et al., 2020; Dafoe et al., 2021) , is that learning occurs on highly non-convex landscapes; thus, even if the learning dynamics equilibrate, their fixed points may include unnatural saddle points or even local minima of very poor performance (Dauphin et al., 2014) . To address this issue, a large stream of recent work has focused on the convergence of optimization-driven (e.g., no-regret) learning dynamics to good limit points. Notable results include avoidance of saddle points and convergence of first order methods, e.g., gradient descent, to local optima (Ge et al., 2015; Lee et al., 2019; Mertikopoulos et al., 2019) , point-wise or last-iterate convergence of various learning dynamics to (proper notions of) equilibria in zero-sum (competitive) games (Daskalakis & Panageas, 2019; Bailey & Piliouras, 2019; Cai et al., 2022) and convergence of no-regret learning to stable points in potential (cooperative) games ()HeliouCM17,PPP17,DBLP:journals/corr/abs-2203-12056,Leo22. Even though these results seem to provide a sufficient starting point to reason about the quality of the collective learning outcome, unfortunately, this is far from being true. Non-trivial game settings routinely possess attracting points of vastly different performance, and this remains true, even if one is able to restrict attention to refined and highly robust notions of equilibria (Flokas et al., 2020) . Nevertheless, and despite the intense interest of the machine learning community to address the problem of equilibrium selection, there is a remarkable scarcity of work in this direction. To make matters worse, static, game-theoretic approaches to the problem (Harsanyi, 1973; Harsanyi & Selten, 1988; van Damme, 1987) , offer little insight, often none at all, from a dynamic/learning perspective. In this case, the challenge is to show approximately optimal performance not for (almost) all initial conditions (which is not possible), but in expectation, i.e., for uniformly random chosen initial conditions (worst-case versus average-case analysis). This is a fundamentally hard problem since one has to couple the performance of equilibria to the relative size of their regions of attraction. However, regions of attraction are complex geometric manifolds that quickly become mathematically intractable even in low-dimensional settings. Importantly, their analysis requires the combination of tools from machine learning, game theory, non-convex optimization and dynamical systems. In terms of average case analysis of game theoretic dynamics in coordination/common interest games, the only other references that we know of are Zhang & Hofbauer (2015) ; Panageas & Piliouras (2016) . In fact, Panageas & Piliouras (2016) is the key precursor to our work. Critically, whereas Panageas & Piliouras (2016) focuses exclusively on a single dynamics, i.e., replicator dynamics and bounding its average price of anarchy (APoA) in restricted instances of games such as Stag Hunt, we show how these techniques can be applied much more broadly by addressing novel challenges: • Axiomatic challenge: Can we formally define the notion of Average Price of Anarchy for large classes of dynamics and games? • Analytical challenge: Even if the definitions can be made robust how do we analyze these nonlinear dynamical systems given random initial conditions in the presence of multiple attractors? • Experimental/visualization challenge: Can we develop novel custom visualization techniques as well as showcase that our experimental results have predictive power even in complex high dimensional settings? Model and Contributions. To make progress in addressing these challenges, we study the q-replicator dynamics (QRD), one of the most fundamental and widely-studied classes of multi-agent learning dynamics that include gradient descent, replicator and log-barrier dynamics as special cases (A. Giannou, 2021) . We start with our first motivating question which we answer affirmatively by proving pointwise convergence of all QRD dynamics to Nash equilibria (NEs) in almost all finite potential games. Potential games include multi-agent interactions in which coordination is desirable, congestion games and games of identical interests as important and widely-studied subclasses (Wang & Sandholm, 2002; Panait & Luke, 2005; Carroll et al., 2019; Dafoe et al., 2020) . The proof of point-wise convergence to NEs combines recent advances (Swenson et al., 2020) foot_0 with standard convergence techniques in the study of potential games, e.g., Palaiopanos et al. (2017b) . Such techniques have been used to either establish convergence of QRD to NEs under the assumption of point-wise convergence (Mertikopoulos & Sandholm, 2016) or prove convergence to limit cycles of (restricted) equilibrium points (Mertikopoulos & Sandholm, 2018) . However, whereas in previous works such results are the main focus, in our case they are only the starting point as they clearly not suffice to explain the disparity between the regularity of QRD in theory (bounded regret, convergence to Nash equilibria) and their conflicting performance in practice (agents' utilities after learning). We then turn to our second question and the fundamental problem of equilibrium quality. While different QRD dynamics may reach the same asymptotically stable equilibria, this is only a minimal and definitely not sufficient condition to compare their performance. In particular, the regions of attraction of these common attracting equilibria, i.e., the sets of convergent initial conditions, can be very different for different QRD dynamics. In our main technical contribution, we tackle this task by providing geometric insights into the shapes and sizes of the regions of attractions of different QRD dynamics. We show that in a class of twoagent potential games, gradient descent reaches the payoff-dominant (socially optimal) equilibrium more often than standard replicator whenever this equilibrium is also risk-dominant (less risky), see Figure 1 . As an implication, we study a class of games in which the Price of Anarchy is unbounded, i.e., in which the worst-case equilibrium can be arbitrarily worse than the socially optimal outcome (Panageas & Piliouras, 2016) , and derive a (tight) upper bound of 2 for the Average Price of Anarchy for the gradient descent dynamics and all instances of the class in which the risk-and payoff-dominant equilibria coincide. This is the first such tight result of its kind. Conceptually, our methods provide a systematic approach to explore the design and hyperparameter space of learning dynamics and extend recent advances towards a taxonomy of learning dynamics in low-dimensional or general potential games (Panageas & Piliouras, 2016; Pangallo et al., 2022) . More importantly, they signify the expressiveness in this task of performance measures that couple the likelihood of convergence to a certain outcome (region of attraction) with the performance of an algorithm at this outcome. From a practical perspective our findings admit a dual interperation. On the one hand, they provide concrete recommendations about the optimality of different QRD dynamics based on the features of the underlying game. On the other hand, they suggest, that even in the simplest possible classes of games, there is not a single optimal QRD dynamic to beat them all. Intriguingly, the above results hinge on two interconnected, yet fundamentally different, theories. The first part (convergence), relies on the theory of Lyapunov analysis and the properties of dissipative systems, i.e., systems that lose momentum over time till they converge to a steady state.By contrast, the second part, i.e., the qualitative analysis of the different parametrizations of the QRD dynamics, relies on the existence of invariant functions that characterize stable and unstable areas in the state space of such systems (Palaiopanos et al., 2017a; Nagarajan et al., 2020) . The existence of invariant functions, however, is a feature most often studied in conservative systems, a fundamentally orthogonal principle to the one of dissipation. 

2. PRELIMINARIES: GAME-THEORETIC AND BEHAVIORAL MODELS

x k = (x ka k ) a k ∈A k ∈ X k , where x ka k is the probability with which agent k uses their action a k ∈ A k and X k := {x k ∈ R |A k | | a k ∈A k x ka k = 1, x ka k ≥ 0} is the (|A k | -1)-dimensional simplex. Given any mixed-action x k ∈ X k , we will write supp(x k ) := {a k ∈ A k | x ka k > 0} to denote the support of the action x k , i.e., the set of all pure actions a k ∈ A k that are selected with a positive probability at x k . Using conventional notation, we write s = (s k , s -k ) ∈ A and x = (x k , x -k ) ∈ X := k∈N X k to denote the joint pure and mixed action profiles of Γ, where s -k and x -k are the vectors of pure and mixed actions, respectively, of all agents other than k. When time is relevant, we will use the index t for all the above, e.g., we will write x k (t) for agent k's choice distribution at time t ≥ 0. The function Φ : A → R is called a potential function of Γ and satisfies u k (s) -u k (s ′ k , s -k ) = Φ(s) -Φ(s ′ i , s -i ) , for all k ∈ N and all s, s ′ ∈ A. The agents' reward functions and the potential function extend naturally to mixed action profiles with u k (x) = E s∼x [u k (s)] and Φ(x) = E s∼x [Φ(s)]. Regular Nash and restricted equilibria. A Nash equilibrium (NE) of Γ is an action profile x * ∈ X such that u k (x * ) ≥ u k (x k , x * -k ), for all k ∈ N and for all x ∈ X . By linearity of expectation, the above definition is equivalent to: u k (x * ) ≥ u k (a k , x * -k ), for all a k ∈ A k , and all k ∈ N , (1) where u k (a k , x * -k ) denotes the reward of agent k when they play the pure action a k , versus the mixed strategies x * -k for the rest of the agents. Let NE(Γ) denote the set of all NE of Γ. A NE is called symmetric if x * 1 = . . . = x * n , and is called fully mixed if supp(x * ) = k∈N supp(x * k ) = A. A NE is called regular if it satisfies the following definition. Definition 2.1 (Regular Nash equilibria (Harsanyi, 1973; Swenson et al., 2020) ). A Nash equilbrium, x * ∈ NE(Γ), is called regular if it is (i) quasi-strict, i.e., if for each player k ∈ N , x * k assigns positive probability to all best responses of player k against x * -k all best responses of each player k ∈ N to x * -k are contained in x * k , and (ii) second-order non-degenerate, i.e., if the Hessian, H(x * ), taken with respect to supp(x * ) is non-singular. Mertikopoulos & Sandholm (2018) . It is easy to see that all restrictions of a potential game Γ := {N , (A k , u k ) k∈N , Φ} are potential games, whose potential functions are restrictions of Φ to the respective subspaces of A. Finally, a restriction of Γ is a game Γ ′ := {N , (A ′ k , u ′ k ) k∈N }, where A ′ k ⊆ A k and u ′ k : A ′ → R is the restriction of u k to A ′ := k∈N A ′ k for all k ∈ N . An action-profile x ∈ X is called a restricted equilibrium of Γ if it is a Nash equilibrium of a restriction of Γ, cf. Behavioral-learning model. The evolution of the agents' choice distributions (or mixed actions) in the joint action space X is governed by the q-replicator dynamics (QRD) which are the parametric dynamics described by the system of differential equations (equations of motions) ẋ := V q (x), where V q : X → R |A| is given by: ẋka k = x q ka k u k (a k , x -k ) - aj ∈A k x q kaj u k (a j , x -k ) aj ∈A k x q kaj , for all k ∈ N , a k ∈ A k , (QRD) for any q ≥ 0. Special cases of the above dynamics are the projection or gradient descent (GD) dynamics, for q = 0, the the standard replicator (RD) dynamics, for q = 1, and the log-barrier or inverse update dynamics, for q = 2 (Mertikopoulos & Sandholm, 2018; A. Giannou, 2021) . 3 POINTWISE CONVERGENCE OF QRD TO NASH EQUILIBRIA Our results consist of two parts. In the first part, which is the subject of this section, we show convergence of QRD to Nash equilibria in a class of potential games, which we term perfectly-regular potential games, whose definition follows. Almost all potential games are PRPGs; this is a generalization of Swenson et al. (2020) who prove that almost all potential games are regular. Furthermore the PRPG class contains other important subclasses of games, e.g., congestion games, as well as games with identical reward functions, which are currently widely studied in the context of cooperative artificial intelligence (Wang & Sandholm, 2002; Panait & Luke, 2005; Carroll et al., 2019; Dafoe et al., 2020) . The convergence result is stated formally in Theorem 3.2; its complete proof may be found in ??. Theorem 3.2 (pointwise convergence of QRD to NE in PRPGs). Given any perfectly-regular potential game (PRPG), Γ, and any interior initial condition x(0) ∈ int X , the q-replicator dynamics, defined as in equation QRD, converge pointwise to a Nash equilibrium x * of Γ for any parameter q ≥ 0. Furthermore, the set Q(int X ) := x0∈int X {x * ∈ X | lim t→∞ x(t) = x * , x(0) = x 0 }, i.e ., the set of all limit points of interior initial conditions, is finite. Sketch of the proof. The proof of Theorem 3.2 proceeds in two steps, which utilize the properties that (i) PRPGs have a finite number of regular equilibria, and (ii) the probability of optimal actions near an equilibrium point is increasing in time with respect to the QRD. In the first step, we prove that for any initial condition, the sequence of joint action profiles x(t) t≥0 that is generated by QRD for any q ≥ 0 converges to a restricted equilibrium of a PRPG, Γ. This relies on the fact that the set of cluster (limit) points of the trajectory-also called the ω-limit set-is a finite, and in fact, as we show, a singleton (a single element set) for any PRPG. In turn, this follows from the fact that a PRPG provably contains only a finite number of restricted equilibria. Having established convergence to restricted equilibria, in the second step, it remains to show that, in fact, any such limit point has to be a NE of Γ, i.e., we need to exclude convergence to restricted equilibria that are not NE of Γ. To establish this, we couple the structure of PRPGs, which ensures that there is a finite number of (regular) restricted equilibria, with the nature of QRD which guarantees that in the vicinity of a limit point, optimal actions, i.e., best responses, need to be played with increasingly higher probability. Thus, all actions in the support of the limit choice distribution of each agent must be best responses against the actions of all other agents, which implies that all points that can be reached by QRD are NE of Γ. In other words, Theorem 3.2 says that for almost all potentials games and almost all initial conditions, QRD converge to a NE of the game.An important implication of Theorem 3.2 is that, when one is reasoning about the quality of the collective learning outcome in cooperative multi-agent settings (as captured by PRPGs), they can restrict their attention to NE. However, reverting to off-the-shelf, static performance measures that compare the quality of different NE, we only obtain results that are meaningless, if not misleading, from a dynamic/learning perspective. The reason is that certain bad (or sometimes even good) NE may be reachable only from a very small set of initial conditions. Thus, we need to develop and argue about average performance measures that couple the outcome of the learning process (NE of an PRPG), with the likelihood that such an outcome is reached by the given learning dynamic (region of attraction of this NE). This is the subject of the next section.

4. QUALITY OF THE COLLECTIVE LEARNING OUTCOME

When static performance metrics fail. Having established that in the landscape of potential games, QRD converge almost surely to Nash equilibria, we next turn our attention to the main challenge of quantifying the quality of the collective learning outcome. In order to do that, one would first have to establish appropriate performance metrics. In a static regime, we can rely in a variety of meaningful metrics, e.g., the Price of Anarchy (PoA) (Koutsoupias & Papadimitriou, 1999; Christodoulou & Koutsoupias, 2005; Roughgarden, 2015) , which is defined as the ratio between the socially worst NE of the game and the socially optimal outcome, where the social-optimality of an outcome x ∈ X is measured with respect to the social welfare SW(x) := k∈N u k (x), i.e., the total reward of the agents. The PoA is a natural static metric that one may consider in a PRPG setup. After all, coordination is the essence of potential games, which typically model multi-agent settings where this is a desirable property. However, it is not difficult to find PRPGs where the PoA fails to provide any meaningful information about the game. Let us consider the following example: Example 4.1 (A simple example of unbounded performance loss). Consider the parametric 2 × 2-PRPG, Γ w , i.e., a 2-player 2-actions PRPG, with payoff functions u w,1 (s 1 , s 2 ) = u w,2 (s 2 , s 1 ) = A w (s 1 , s 2 ), where the matrix A w ∈ R 2×2 is given by: A w = 1 0 0 w , 1 ≤ w. The games, Γ w , are already expressive enough to capture the aforementioned problem. In order to see this, observe that the NE that corresponds to x 1 = (1, 0) and x 2 = (1, 0) has social welfare equal to SW(x) = 1 + 1 = 2 but the NE that corresponds to x ′ 1 = (0, 1) and x ′ 2 = (0, 1) has SW(x ′ ) = w + w = 2w. Since w can take any value larger than 1, the difference in performance can be arbitrary large with respect to the PoA. Specifically, PoA( Γ w ) = SW(x ′ ) SW(x) = w → ∞ as w → ∞. 2

4.1. REGIONS OF ATTRACTION AND AVERAGE PERFORMANCE MEASURES

While useful in static environments, the PoA metric fails to capture the dynamic nature of multi-agent learning. In particular, it does not provide an answer to the question: How likely is it for the agents to reach a good or bad outcome given that the multi-agent system converges? To answer this question and argue about the collective performance of the game dynamics, we need to quantify the likelihood of each outcome when we the initial conditions of the system are randomly sampled. A region of attraction of a given outcome formalizes this notion. Definition 4.2 (Regions of attraction). Let Γ be any game and assume that its joint action profile, x ∈ X , is evolving according to the equations of motion ẋ = f (x). Then for any x * ∈ X , the set RoA f,Γ (x * ) := {x 0 ∈ X | lim t→∞ x(t) = x * , x(0) = x 0 } is called the region of attraction (RoA) of x * with respect to the dynamics f . In other words, the RoA of a point x * ∈ X is the set of all initial conditions in X for which the dynamics asymptotically converge to x * . Note that RoAs do not intersect. If we can determine the regions of attraction of some game dynamics, then given a certain static performance metric, e.g., the social welfare, we can define a corresponding average-performance metric that weighs-in all possible outcomes, in the sense of limit points, according to their likelihood of occurring with respect to the given dynamics. In order for this average to be meaningful, a minimum requirement, is that the dynamics converge for almost all, i.e., all but a measure zero, initial conditions. Formally, an average performance metric is defined as followsfoot_2 : Definition 4.3 (Average-performance metric). Let Γ be a multi-agent game and assume that its joint action profile, x ∈ X , is evolving according to the equations of motion ẋ = f (x). Let X 0 ⊆ X be a set of initial conditions such that the set of convergence points Q(X 0 ) is finite. Then, given a performance metric g : X → R of Γ, the average-performance of the dynamics governed by f in Γ with respect to the performance metric g and the set of initial condition X 0 , is given by APM g,X0 (f, Γ) := x * ∈Q(X0) µ(RoA f,Γ (x * )) • g(x * ), (APM) where µ is a probability measure on X 0 . In other words, an APM is the expected optimality of a random initialization of the dynamics in X 0 ⊆ X with respect to some metric g. For instance, if the performance metric g is the social welfare, then the average-performance metric with respect to g measures the expected social welfare of the system for any random initialization in X 0 . The average-performance metric that we are going to use in the remainder of this section is the Average Price of Anarchy (APoA). The APoA is an APM with respect to the social welfare, re-normalised such that the APoA is greater than equal to 1, with equality only if (almost) all the initial conditions converge to the socially optimal outcome of the system. Formally, given a multi-agent game Γ, equations of motion ẋ = f (x) that describe the evolution of the agents actions in Γ, and a set of initial conditions X 0 ⊆ X that consists of almost all X , the APoA is given by the formula: APoA(f, Γ) = max x∈X SW(x) APM SW,X0 (f, Γ) . (APoA) Here, it is important to note that Definition 4.3 does not ensure that an APM is always a meaningful metric for the system. However, as long as one can prove that (i) the dynamics converge pointwise to some x * ∈ Q(X ) ⊆ NE(Γ) for almost all initial condition x 0 ∈ X , and (ii) the set of limit points, Q(X ), is finite -two conditions that are satisfied by any PRPG that evolves with respect to some QRD (cf. Theorem 3.2)-the APoA has an intuitive interprentation. Specifically, in this setup, the APoA is always bounded between the PoA and the Price of Stability (PoS) of the game, i.e., the ratio between the socially optimal outcome and the socially optimal NE.

4.2. THE TAXONOMY OF QRD IN 2 × 2 PRPGS

To systematically evaluate and compare the performance of different QRD in perfectly-regular finite potential games, we address the case of symmetric 2 × 2 coordination games, i.e., games in which one can change the identities of the players without changing the payoff to the actions. Such games constitute one of the current frontiers in terms of classification of game-dynamics (Zhang & Hofbauer, 2015; Pangallo et al., 2022) . Such games are trivially potential games and include games of identical payoffs as special cases. Omitted definitions and proofs of this section may be found in ??. Representation of symmetric 2 × 2 PRPGs. Recall that a NE, x * , of a symmetric potential Γ is called payoff-dominant if u k (x * ) ≥ u k (x ′ ) for all x ′ ∈ NE(Γ), and it is called risk-dominant if x * is unilaterally optimal against the uniform distribution of the rest of the agents. All symmetric 2 × 2 PRPGs can be conveniently represented by the parametric class of games Γ w,β , with payoff functions u w,β,1 (s 1 , s 2 ) = u w,β,2 (s 2 , s 1 ) = A w,β,s1,s2 , where the matrix A w,β ∈ R 2×2 is given by: A w,β = 1 0 β w , β ≤ 1 ≤ w. The game Γ w,β has the same NE as the original game, retains the payoff-and risk-dominance properties of its equilibrium points, and preserves the limiting behavior of any QRD (see ??). Each game Γ w,β has three NE, two pure at x = y = 0 and x = y = 1, with social welfare SW(0, 0) = 2w and SW(1, 1) = 2, respectively, as well as one fully-mixed NE at: x * = y * = α := w w + 1 -β . For convenience, we are going to refer to the first pure-NE as x w . Note that x w is payoff-dominant for any parametrization Γ w,β , and it is also risk-dominant whenever w > 1 -β, or equivalently, whenever α > 0.5. The first result of this section states that whenever the risk-and payoff-dominant equilibria of Γ w,β coincide, i.e., α ≥ 0.5, then the gradient descent dynamics, i.e., the 0-replicator dynamics, perform better (or equally in the generic case α = 0.5) on average than the standard replicator dynamics with respect to the social welfare of their outcomes, i.e., they yield a smaller APoA. In any other instance of these games, i.e., for α < 0.5, the RD perform better than GD with respect to the same metric. Theorem 4.4 (Performance of QRD in symmetric 2 × 2 PRPG). Given any 2 × 2 symmetric PRPG, which, without any loss of generality, can be represented as an instance Γ w,β , it holds that APM SW,int X (V 0 , Γ w,β ) ≥ APM SW,int X (V 1 , Γ w,β ) if and only if whenever the payoff-dominant equilibrium is also risk-dominant, with equality only when if and only if α = 0.5, i.e., w = 1 -β, where V 0 , V 1 are the equations of motion of the 0-replicator and 1-replicator dynamics, respectively equation QRD. Interpretation of Theorem 4.4. The proof of Theorem 4.4 proceeds with a first order analysis of the manifolds that separate the regions of attractions of the two pure equilibria for the different dynamics (cf. Figures 2 and 3 ). When comparing the gradient descent (GD) dynamics and the replicator dynamics (RD), the main implication of this theorem is that the expected social welfare is optimized by GD whenever risk and payoff-dominant equilibria coincide and is optimized by RD when risk and payoff-dominant equilibria differ. More generally, this result may be interpreted in two ways. On the one hand, it provides a concrete recommendation on the optimal behavior of the agents (GD versus RD) based solely on the properties of the underlying game. On the other hand, it suggests that even in the low-dimensional setting of 2 × 2 potential games, there is not a uniform recommendation, and the optimal behavior largely depends on the features of the underlying game. As it turns out, in this case, the decisive feature is the riskiness of the payoff-dominant equilibrium. Generalization to all QRD. Technically, the proof of Theorem 4.4 uses tools that are orthogonal to the Lyapunov analysis, and the theory of dissipation of dynamical systems, that we used to prove convergence to NE in section 3. It leverages the constants of motion or invariant functions (Nagarajan et al., 2020) , i.e., quantities that remain constant along the trajectories of the learning dynamics. The rationale is that if one could identify such a function, then, by finding its value at the unique mixed equilibrium α of the game, they can determine all initial conditions that asymptotically converge to it: these will be all points at the same level set of the invariant function. The manifold, i.e., the geometric locus, of all the points that converge to the equilibrium, i.e., the stable manifold of α, is the one that separates the regions of attractions of the two pure NE of the game. Because of this property, we may also refer to the stable manifold of the mixed NE as the separatrix (Panageas & Piliouras, 2016) . Note that, since the dynamics are also backward-invariant (Panageas & Piliouras, 2016; Mertikopoulos & Sandholm, 2018) , their level-set will also contain a set of initial conditions that converge to it when moving backward in time. This points constitute the unstable manifold of α. In the following lemma we identify such an invariant for all QRD. Lemma 4.5 (Invariant functions of QRD in 2 × 2 symmetric PRPGs). Given a 2 × 2 symmetric PRPG, Γ w,β , whose agents evolve with respect to the q-replicator dynamics, the separable function Ψ q : (0, 1) 2 → R with Ψ q (x, y) := ψ q (x) -ψ q (y), where ψ q : (0, 1) → R is given by: Figure 2 : The invariant function, Ψ q (x, y), for all x, y ∈ [0, 1] 2 in the game Γ w,β for w = 2, β = 0, and various values of q: q = 0 (gradient descent), q = 1 (standard replicator), q = 2 (log-barrier), and q = 20. The invariant function becomes very steep at the boundary as q increases, taking both arbitrarily large negative (dark) and positive (light) values in the vicinity of the NE. ψ q (x) =            x 2-q + (1 -x) 2-q -1 2 -q + 1 -αx 1-q -(1 -α)(1 -x) 1-q 1 -q , q ̸ = 1, 2, α ln(x) + (1 -α) ln(1 -x), q = 1, ln(x) + ln(1 -x) + α x + 1 -α 1 -x , q = 2, ( ) remains constant along any trajectory {x(t), y(t)} t≥0 of the system. The function Ψ q (x) is continuous with respect to the parameter q at, both, q = 1 and q = 2, since lim q→1 Ψ q (x) = Ψ 1 (x) and lim q→2 Ψ q (x) = Ψ 2 (x) for all x ∈ (0, 1). The manifolds for q = 0, q = 1, and q = 2 are shown in shades of black for reference (cf. Figure 3 ). The region of attraction of the payoff-dominant equilibrium (bottom-left corner) shrinks as q increases. In Figure 2 , we visualize the invariant function, Ψ q (x, y), for x, y ∈ (0, 1) 2 for various of q ∈ [0, 20]. From the panels of Figure 2 , it is also evident that Ψ q (x, y) acts as a handy tool to visualize the regions of attraction of the two pure NE of the game. Namely, at the unique mixed NE, i.e., at x = y = α, the invariant function, Ψ q , is equal to 0. The same holds for any point (x, y) ∈ (0, 1) 2 with x = y. Thus, we can factorize Ψ q (x, y) as Ψ q (x, y) = Ψ q,Stable (x, y) • (x -y) where Ψ q,Stable (x, y) = 0 is precisely the geometric locus of all points (x, y) ∈ (0, 1) 2 such that lim t→∞ x(t) = α, and y = x is the geometric locus of all points such lim t→-∞ x(t) = α. These two manifolds constitute the stable and unstable manifolds, respectively, of the q-replicator dynamics. Since the invariant function Ψ q (x, y) takes the value 0 only at the stable and unstable manifolds, we can visualize the separatrix for different values of q by plotting the 0-level set of the invariant functions in Figure 2 . These are depicted in Figure 3 . As a sanity check, we also see from Figure 3 that the region of attraction of the payoff-dominant equilibrium for q = 0 (GD dynamics) is larger than the region of attraction for q = 1 (RD). Empirical evidence for the monotonicity of the APM with respect to q. If we stack the stable manifolds (solid blue lines) in the panels of Figure 3 , it becomes evident that the region of attraction of the payoff-dominant and risk-dominant equilibrium grows as q decreases to 0. This is depicted in Figure 4 for all values of q ∈ [0, 10] (the progression of the surface remains essentially unchanged for larger q). Analogous plots (but with the results reversed as predicted by Theorem 4.4) can be generated for instances of Γ w,β , in which the risk-dominant equilibrium is different from the payoff-dominant one, as well as, for 2 × 2 generic PRPGs (cf. section 4). In general, putting together Theorem 4.4 and the aforementioned visualizations, we have both theoretical and empirical evidence that the region of attraction of the payoff-dominant equilibrium in Γ w,β is decreasing (increasing) in q for q ≥ 0 whenever this equilibrium is (is not) risk-dominant. Formal verification of the monotonicity of Figure 3 : The stable manifolds, Ψ q,Stable (x, y) = 0, (solid blue lines) for the same values of q and the same instance of Γ w,β as in Figure 2 , in which the payoff-and risk-dominant NE is at the bottom left corner. For all q, the separatrix goes through the mixed NE at the intersection of the x * (dashed red) and y * (dashed black) coordinates. All panels also include the unstable manifold defined by x -y = 0 (dashed blue line). The region of attraction of the payoff-dominant NE is larger for all values of q; however, this is because this NE is also risk-dominant, cf. Theorem 4.4. the stable manifolds the regions of attractions with respect to q in the QRD parametrization remains open. Application: APoA in 2 × 2 PRPGs. We conclude this section by providing a concrete result regarding the evaluation of the APoA average-performance measure in the class of 2 × 2 symmetric PRPGs, which showcases the practical importance of Theorem 4.4 and the invariant function approach. Theorem 4.6. The APoA of GD dynamics in all 2 × 2 symmetric PRPGs, Γ w,β , is bounded by 2, i.e., APoA(V 0 , Γ w,β ) ≤ 2. Furthermore, this bound is tight. The bound also holds for β = 1 -w, but in this case, there exists no risk-dominant equilibrium. The proof of Theorem 4.6 essentially proceeds by first order analysis of the function depicted in Figure 5 which, in turn, depends on the invariant function of the gradient descent dynamic. One way to see that this bound is tight, is to set β = 1 -w + ϵ, for a small ϵ > 0 and let w increase (cf. Figure 5 ). In combination Theorem 4.4 and Theorem 4.6 imply that the APoA of the RD (QRD with q = 1), is not upper bounded by 2 whenever α < 0.5, i.e., whenever the risk-and payoff-dominant equilibria are different. However, for the case α > 0.5, the separatrices for all q ≥ 0 as visualized in Figure 4 , (empirically) imply that similar bounds hold for all values of q.foot_3 In ??, we run simulations of q-replicator dynamics which provide evidence that the statement of Theorem 4.4 and the bound of Theorem 4.6 continue to hold in PRPGs of higher dimensions, i.e., beyond the 2 × 2 setting.

5. CONCLUSIONS

In this paper, we studied the class of q-replicator dynamics (QRD), and showed that all QRD converge pointwise to Nash equilibria in perfectly-regular potential games, a class of games that encompasses almost all potential games, i.e., the standard models of multi-agent coordination. The convergence of QRD in these settings is remarkably robust, occurring regardless of the number of agents or actions and for all possible parametrizations of QRD. From the perspective of equilibrium selection and quality, however, convergence provides little information, often none at all. Turning to this challenging problem, we provided geometric insights into the reasons why different dynamics exhibit fundamentally different performance despite their convergence to the very same set of attracting points. Our techniques leverage two intertwined, yet orthogonal to each other elements of dynamical systems theory: dissipation (Lyapunov theory) and conservation (invariant functions).



Specifically,Swenson et al. (2020) shows that all NEs in almost all potential games are regular in the sense of Harsanyi, i.e., they are isolated and highly robustHarsanyi (1973);van Damme (1987). Almost all refers to a set whose complement is a closed set with Lebesgue measure zero. Note about notation: In a 2 × 2-game, one usually abuses notation and writes x, y ∈ [0, 1] (instead of (x, 1 -x) and (y, 1 -y)) to denote the mixed choice distributions of players 1 and 2, respectively. Then, all notions that we presented in Section 2 may be viewed as functions of x, y. For example, we could have written SW(1, 1) to denote the social welfare of the NE that corresponds to x = y = 1. In this section we are going to interchange between the two notations, but our choice is always going to be clear by the context. For this definition, recall that a probability measure µ on a compact space X is a σ-additive function from the powerset of X to R+ such that µ(X ) = 1 and µ(X ′ ) ≥ 0 for all X ′ ⊆ X . To avoid confusion, in Figure4, we visualize the stable manifolds for the case in which GD are the dynamics with the largest region of attraction, i.e., has the lowest APoA. The case α < 0.5, in which the manifolds are simply mirrored on the y = 1 -x diagonal, is in ??.



Figure 1: Vector fields of gradient descent (top) and replicator dynamics (bottom) for a game with payoffand risk-dominant equilibrium at the bottom-left corner. The trajectories in the region of attraction of the good (bad) equilibrium are shown in gray (red). The black solid and dashed lines show the stable and unstable manifolds, respectively. In this case, gradient descent outperforms replicator dynamic.

Game-theoretic model. A multi-agent finite potential game Γ := {N , (A k , u k ) k∈N , Φ} denotes the interaction between a set N := {1, . . . , n} of agents. Each agent k ∈ N has a finite set of actions, A k , with size |A k |, and a reward function u k : A → R where A := k∈N A k is the set of all pure action profiles of Γ. Agents may use mixed actions or choice distributions,

Figure 4: Stable manifold (separatrix) for all different values of q ∈ [0, 10] (from blue to brown) in the Γ w,β game for w = 2 and β = 0.The manifolds for q = 0, q = 1, and q = 2 are shown in shades of black for reference (cf. Figure3). The region of attraction of the payoff-dominant equilibrium (bottom-left corner) shrinks as q increases.

Figure 5: APoA of a 2 × 2 symmetric PRPG for the gradient descent dynamics and various values of β and w. The APoA is upper bounded by 2 (dark to light values) as shown in Theorem 4.6.

