NO-REGRET LEARNING IN STRONGLY MONOTONE GAMES CONVERGES TO A NASH EQUILIBRIUM

Abstract

This paper studies a class of online games involving multiple agents with continuous actions that aim to minimize their local loss functions. An open question in the study of online games is whether no-regret learning for such agents leads to the Nash equilibrium. We address this question by providing a sufficient condition for strongly monotone games that guarantees Nash equilibrium convergence in a time average sense, regardless of the specific learning algorithm, assuming only that it is no-regret. Furthermore, we show that the class of games for which no-regret learning leads to a Nash equilibrium can be expanded if some further information on the learning algorithm is known. Specifically, we provide relaxed sufficient conditions for first-order and zeroth-order gradient descent algorithms as well as for best response algorithms in which agents choose actions that best respond to other agents' actions during the last episode. We analyze the convergence rate for these algorithms and present numerical experiments on three economic market problems to illustrate their performance.

1. INTRODUCTION

Online convex optimization (Hazan et al., 2016; Shalev-Shwartz et al., 2012) is used to solve decision-making problems where the cost function is unknown and optimal actions are selected with only incomplete information. Recently, online convex optimization has also been employed for the solution of games involving multiple agents with applications ranging from traffic routing (Sessa et al., 2019) to economic market optimization (Narang et al., 2022; Wang et al., 2022; Lin et al., 2020) . In these online convex games, agents simultaneously take actions to minimize their loss functions, which depend on the other agents' actions. Generally, agents in non-cooperative games have access to limited information. For example, they may not be able to observe the actions of other agents and may not even know the exact game mechanism. As a result, rational agents will focus on sequentially learning their individual optimal actions at the expense of other agents, and their ability to do so efficiently can be quantified using notions of regret that captures the cumulative loss of the learned online actions compared to the best actions in hindsight. An algorithm is said to achieve no-regret learning if the regret of the sequence of online actions generated by this algorithm is sub-linear in the total numbers of episodes T . While no-regret learning has been studied for a variety of games; see e. 2022), the analysis of regret alone is not sufficient to characterize the limit points of a learning algorithm, i.e., the sequence of actions taken by the algorithm. In fact, no-regret learning may not converge at all and can exhibit limit cycles, as shown in Mertikopoulos et al. (2018) . In this paper, we adopt the notion of a Nash equilibrium, that describes a stable point at which the agents have no incentives to change their actions, to analyze the convergence properties of no-regret algorithms for online convex games. A growing literature has recently focused on showing Nash equilibrium convergence for online games; see, e.g., Bravo et al. ( 2018 (2020) . For example, for potential games with finite actions, Heliou et al. (2017) show that the sequence of play returned by the exponential weight algorithm converges to the Nash equilibrium. On the other hand, for games with continuous actions, strong monotonicity, which ensures the uniqueness of the Nash equilibrium (Rosen, 1965) , is a sufficient condition for Nash equilibrium convergence for many specific learning algorithms, including the mirror descent algorithm (Bravo et al., 2018; Lin et al., 2021) , the dual averaging algorithm (Mertikopoulos & Zhou, 2019) and the derivative-free algorithm (Drusvyatskiy & Ratliff, 2021; Narang et al., 2022) . Besides, an optimistic gradient algorithm is proposed in Golowich et al. ( 2020) that achieves tight last-iterate convergence for smooth monotone games. Similarly, Lin et al. ( 2020) investigate the last-iterate convergence for continuous games with unconstrained action sets that satisfy a so-called 'cocoercive' condition that includes a broader class of games with potentially many Nash equilibria. However, all these works analyze the convergence and/or regret for specific learning algorithms and for assumptions that depend on this specific choice of algorithms and games. In this paper, we follow a different approach and focus on understanding for what classes of games and learning algorithms Nash equilibrium convergence can be guaranteed. Specifically, we are interested in understanding whether and for what class of online convex games with continuous action sets no-regret learning converges to a Nash equilibrium regardless of the specific algorithm. Moreover, we are interested in understanding whether and how this class of online convex games can be expanded when the no-regret learning algorithm is known. In our main result, we show that for m-strongly monotone games with parameter m > 2L √ N -1, where L is a Lipschitz constant of the gradient function with respect to the actions of other agents, and N is the number of agents, any no-regret algorithm leads to Nash equilibrium convergence. While Nash equilibrium convergence has been analyzed for different combinations of learning algorithms and games, to the best of our knowledge, this is the first effort to understand for what classes of games and learning algorithms Nash equilibrium convergence can be guaranteed, and thus bridge regret analysis with Nash equilibrium convergence in games. We note that this result applies to any no-regret algorithm and thus can provide theoretical support for the convergence of any such algorithm for which regret analysis is easy but Nash equilibrium convergence is difficult to show. Furthermore, we show that the class of games m > 2L √ N -1 can be expanded if additional information about a specific no-regret algorithm is known. First, for the class of gradient-descent (GD) algorithms including first-order and zeroth-order algorithms, we show that m > 0 is a sufficient condition for Nash equilibrium convergence. Note that Drusvyatskiy & Ratliff (2021) also show convergence of the zeroth-order algorithm to a Nash equilibrium, but with the additional assumption that the Jacobian of the gradient function is Lipschitz continuous which they need to ensure that the smoothed game induced by the zeroth-order oracle is strongly monotone. In this work, we show that this assumption is not necessary and Nash equilibrium convergence can still be guaranteed even when the smoothed game is not strongly monotone. In addition, we study the class of the best response algorithms, where every agent selects the best action in the next episode given the other agents' current actions. Best response algorithms have been studied for several classes of games, including potential games (Swenson et al., 2018; Durand & Gaujal, 2016) and zero-sum games (Leslie et al., 2020) . However, none of these works study games with continuous actions or provides sufficient conditions that guarantee convergence to a Nash equilibrium. We show that for m-strongly monotone games, the best response algorithm ensures Nash equilibrium convergence if m > L √ N -1. This is, to the best of our knowledge, the first convergence analysis of the best response algorithm in continuous games. We numerically validate the proposed algorithms using three online marketing examples, specifically the Cournot game, the Kelly auction, and the Retailer pricing competition, that satisfy different conditions on the parameter m and, therefore, belong to different game classes. We show that for games that do not satisfy the sufficient condition m > L √ N -1, such as the Cournot game, the best response algorithm may diverge. As a result, gradient descent algorithms may be better suited to solve games in this class. We also compare the performance of these algorithms for games for which Nash equilibrium convergence is guaranteed. We observe that when m > L √ N -1, the best response algorithm outperforms first-order gradient descent which, in turns, outperforms the zeroth-order method. On the other hand, when 0 < m < L √ N -1, first-order gradient descent outperforms the zeroth-order method. In summary, by defining sufficient conditions for Nash equilibrium convergence that depend only on the properties of the game, i.e., the parameter m, and not the learning algorithm used to solve it, our analysis allows to identify classes of games for which no-regret learning guarantees convergence to a Nash equilibrium without analyzing specific algorithms or identify specific no-regret learning algorithms with no guaranteed convergence to a Nash equilibrium, both fundamental questions in the analysis of online convex games.



g., Sessa et al. (2019); Tatarenko & Kamgarpour (2018); Wang et al. (2022); Daskalakis et al. (2021); Anagnostides et al. (

); Tatarenko & Kamgarpour (2020); Drusvyatskiy & Ratliff (2021); Lin et al. (2021; 2020); Mertikopoulos & Zhou (2019); Narang et al. (2022); Heliou et al. (2017); Golowich et al. (2020); Azizian et al.

