CONVERGENCE IS NOT ENOUGH: AVERAGE-CASE PER-FORMANCE OF NO-REGRET LEARNING DYNAMICS Anonymous

Abstract

Learning in games involves two main challenges, even in settings in which agents seek to coordinate: convergence to equilibria and selection of good equilibria. Unfortunately, solving the issue of convergence, which is the focus of state-of-the-art models, conveys little information about the quality of the equilibria that are eventually reached, often none at all. In this paper, we study a class of arbitrary-sized games in which q-replicator (QRD), a widely-studied class of no-regret learning dynamics that include gradient descent (GD), standard replicator dynamics (RD), and log-barrier dynamics as special cases, can be shown to converge pointwise to Nash equilibria. Turning to our main task, we provide both theoretical and experimental results on the average case performance of different learning dynamics in games. For example, in the case of GD, we show a tight average Price of Anarchy bound of 2, for a class of symmetric 2 × 2 potential games with unbounded Price of Anarchy (PoA). Furthermore, in the same class, we provide necessary and sufficient conditions so that GD outperforms RD in an average case analysis giving novel insights about two of the most widely applied dynamics in game theory. Finally, our experiments suggest that unbounded gaps between average case performance and PoA analysis are common, indicating a fertile area for future work.

1. INTRODUCTION

Multi-agent coordination often involves the solution of complex optimization problems. What makes these problems so hard, even when agents have common (Bard et al., 2020) or aligned interests (Dafoe et al., 2020; Dafoe et al., 2021) , is that learning occurs on highly non-convex landscapes; thus, even if the learning dynamics equilibrate, their fixed points may include unnatural saddle points or even local minima of very poor performance (Dauphin et al., 2014) . To address this issue, a large stream of recent work has focused on the convergence of optimization-driven (e.g., no-regret) learning dynamics to good limit points. Notable results include avoidance of saddle points and convergence of first order methods, e.g., gradient descent, to local optima (Ge et al., 2015; Lee et al., 2019; Mertikopoulos et al., 2019) , point-wise or last-iterate convergence of various learning dynamics to (proper notions of) equilibria in zero-sum (competitive) games (Daskalakis & Panageas, 2019; Bailey & Piliouras, 2019; Cai et al., 2022) and convergence of no-regret learning to stable points in potential (cooperative) games ()HeliouCM17,PPP17,DBLP:journals/corr/abs-2203-12056,Leo22. Even though these results seem to provide a sufficient starting point to reason about the quality of the collective learning outcome, unfortunately, this is far from being true. Non-trivial game settings routinely possess attracting points of vastly different performance, and this remains true, even if one is able to restrict attention to refined and highly robust notions of equilibria (Flokas et al., 2020) . Nevertheless, and despite the intense interest of the machine learning community to address the problem of equilibrium selection, there is a remarkable scarcity of work in this direction. To make matters worse, static, game-theoretic approaches to the problem (Harsanyi, 1973; Harsanyi & Selten, 1988; van Damme, 1987) , offer little insight, often none at all, from a dynamic/learning perspective. In this case, the challenge is to show approximately optimal performance not for (almost) all initial conditions (which is not possible), but in expectation, i.e., for uniformly random chosen initial conditions (worst-case versus average-case analysis). This is a fundamentally hard problem since one has to couple the performance of equilibria to the relative size of their regions of attraction. However, regions of attraction are complex geometric manifolds that quickly become mathematically intractable even in low-dimensional settings. Importantly, their analysis requires the combination of tools from machine learning, game theory, non-convex optimization and dynamical systems. In terms of average case analysis of game theoretic dynamics in coordination/common interest games, the only other references that we know of are Zhang & Hofbauer (2015) ; Panageas & Piliouras (2016). In fact, Panageas & Piliouras ( 2016) is the key precursor to our work. Critically, whereas Panageas & Piliouras (2016) focuses exclusively on a single dynamics, i.e., replicator dynamics and bounding its average price of anarchy (APoA) in restricted instances of games such as Stag Hunt, we show how these techniques can be applied much more broadly by addressing novel challenges: • Axiomatic challenge: Can we formally define the notion of Average Price of Anarchy for large classes of dynamics and games? • Analytical challenge: Even if the definitions can be made robust how do we analyze these nonlinear dynamical systems given random initial conditions in the presence of multiple attractors? • Experimental/visualization challenge: Can we develop novel custom visualization techniques as well as showcase that our experimental results have predictive power even in complex high dimensional settings? or prove convergence to limit cycles of (restricted) equilibrium points (Mertikopoulos & Sandholm, 2018). However, whereas in previous works such results are the main focus, in our case they are only the starting point as they clearly not suffice to explain the disparity between the regularity of QRD in theory (bounded regret, convergence to Nash equilibria) and their conflicting performance in practice (agents' utilities after learning). We then turn to our second question and the fundamental problem of equilibrium quality. While different QRD dynamics may reach the same asymptotically stable equilibria, this is only a minimal and definitely not sufficient condition to compare their performance. In particular, the regions of attraction of these common attracting equilibria, i.e., the sets of convergent initial conditions, can be very different for different QRD dynamics.



Specifically, Swenson et al. (2020) shows that all NEs in almost all potential games are regular in the sense of Harsanyi, i.e., they are isolated and highly robustHarsanyi (1973); van Damme (1987). Almost all refers to a set whose complement is a closed set with Lebesgue measure zero.



Figure 1: Vector fields of gradient descent (top) and replicator dynamics (bottom) for a game with payoffand risk-dominant equilibrium at the bottom-left corner. The trajectories in the region of attraction of the good (bad) equilibrium are shown in gray (red). The black solid and dashed lines show the stable and unstable manifolds, respectively. In this case, gradient descent outperforms replicator dynamic.

