NO-REGRET LEARNING IN STRONGLY MONOTONE GAMES CONVERGES TO A NASH EQUILIBRIUM

Abstract

This paper studies a class of online games involving multiple agents with continuous actions that aim to minimize their local loss functions. An open question in the study of online games is whether no-regret learning for such agents leads to the Nash equilibrium. We address this question by providing a sufficient condition for strongly monotone games that guarantees Nash equilibrium convergence in a time average sense, regardless of the specific learning algorithm, assuming only that it is no-regret. Furthermore, we show that the class of games for which no-regret learning leads to a Nash equilibrium can be expanded if some further information on the learning algorithm is known. Specifically, we provide relaxed sufficient conditions for first-order and zeroth-order gradient descent algorithms as well as for best response algorithms in which agents choose actions that best respond to other agents' actions during the last episode. We analyze the convergence rate for these algorithms and present numerical experiments on three economic market problems to illustrate their performance.

1. INTRODUCTION

Online convex optimization (Hazan et al., 2016; Shalev-Shwartz et al., 2012) is used to solve decision-making problems where the cost function is unknown and optimal actions are selected with only incomplete information. Recently, online convex optimization has also been employed for the solution of games involving multiple agents with applications ranging from traffic routing (Sessa et al., 2019) to economic market optimization (Narang et al., 2022; Wang et al., 2022; Lin et al., 2020) . In these online convex games, agents simultaneously take actions to minimize their loss functions, which depend on the other agents' actions. Generally, agents in non-cooperative games have access to limited information. For example, they may not be able to observe the actions of other agents and may not even know the exact game mechanism. As a result, rational agents will focus on sequentially learning their individual optimal actions at the expense of other agents, and their ability to do so efficiently can be quantified using notions of regret that captures the cumulative loss of the learned online actions compared to the best actions in hindsight. An algorithm is said to achieve no-regret learning if the regret of the sequence of online actions generated by this algorithm is sub-linear in the total numbers of episodes T . While no-regret learning has been studied for a variety of games; see e. 2022), the analysis of regret alone is not sufficient to characterize the limit points of a learning algorithm, i.e., the sequence of actions taken by the algorithm. In fact, no-regret learning may not converge at all and can exhibit limit cycles, as shown in Mertikopoulos et al. (2018) . In this paper, we adopt the notion of a Nash equilibrium, that describes a stable point at which the agents have no incentives to change their actions, to analyze the convergence properties of no-regret algorithms for online convex games. A growing literature has recently focused on showing Nash equilibrium convergence for online games; see, e.g., Bravo et al. ( 2018 (2020) . For example, for potential games with finite actions, Heliou et al. (2017) show that the sequence of play returned by the exponential weight algorithm converges to the Nash equilibrium. On the other hand, for games with continuous actions, strong monotonicity, which ensures the uniqueness of the Nash



g., Sessa et al. (2019); Tatarenko & Kamgarpour (2018); Wang et al. (2022); Daskalakis et al. (2021); Anagnostides et al. (

); Tatarenko & Kamgarpour (2020); Drusvyatskiy & Ratliff (2021); Lin et al. (2021; 2020); Mertikopoulos & Zhou (2019); Narang et al. (2022); Heliou et al. (2017); Golowich et al. (2020); Azizian et al.

