STOCHASTIC NO-REGRET LEARNING FOR GENERAL GAMES WITH VARIANCE REDUCTION

Abstract

We show that a stochastic version of optimistic mirror descent (OMD), a variant of mirror descent with recency bias, converges fast in general games. More specifically, with our algorithm, the individual regret of each player vanishes at a speed of O(1/T 3/4 ) and the sum of all players' regret vanishes at a speed of O(1/T ), which is an improvement upon the O(1/ √ T ) convergence rate of prior stochastic algorithms, where T is the number of interaction rounds. Due to the advantage of stochastic methods in the computational cost, we significantly improve the time complexity over the deterministic algorithms to approximate coarse correlated equilibrium. To achieve lower time complexity, we equip the stochastic version of OMD in (AM21) with a novel low-variance Monte-Carlo estimator. Our algorithm extends previous works (AM21; CJST19) from two-player zero-sum games to general games.

1. INTRODUCTION

How does a player in a game interact with others, and selfishly maximize its own utilities? This is one central problem in online learning and game theory and has intimate connections to economics, auction design, and machine learning. The study of this problem was pioneered by (Bro49; Rob51). Robinson (Rob51) shows that fictitious play asymptotically converges to Nash equilibrium in twoplayer zero-sum games. But its convergence rate is exponentially slow and it may not even converge in non-zero-sum games (Sha64). Another natural choice for each player is to use no-regret learning algorithms. With some well-known families of no-regret learning algorithms, e.g., mirror descent (NY83) and follow-the-regularizedleader (KV05), the average regret of each player vanishes at a speed of O(1/

√

T ) where T is the number of interaction rounds. This regret bound implies an O(1/ √ T ) convergence rate to the coarse correlated equilibrium in general games (or Nash equilibrium in two-player zero-sum games). And it is noteworthy that Chen and Peng (CP20) show that these algorithms' convergence rate is Ω(1/ √ T ). Players can do even better with some special no-regret algorithms tailored for games. However, the computational cost of players to use OMD, as well as other deterministic no-regret algorithms, could be not manageable. Since each player needs to compute the exact loss vector to update its strategy in OMD. And the time complexity of computing this exact loss vector is, in the worst case, exponential in the number of players in the game. One standard method to accelerate the computation is to estimate the loss vector with Monte-Carlo methods. But a Monte-Carlo estimator with an uncontrolled variance will immediately make the convergence rate degenerate to O(1/ √ T ). AM21) make a huge step towards developing efficient stochastic algorithms for games, their algorithms are tailored for the simplest two-player zero-sum games and could not cover more practical settings, such as auctions, which may involve multiple players and not be zero-sum. One crucial factor of the algorithms in Carmon et al. (CJST19) and Alacaoglu et al. ( AM21) is a stochastic loss estimator with small variance. However, the time complexity of calculating this estimator is O(A N -1 ) where N is the number of players, which is exponentially large in general games. The high complexity of the estimator becomes a major obstacle to developing efficient stochastic algorithms for general games. Contributions. We consider general normal-form games with an arbitrary number of players. Compared to the two-player zero-sum case, this is more challenging and practically significant. We show that in general games, a stochastic version of OMD converges to the optimal social welfare (or equivalently, minimizes the sum of all players' regrets) at a rate of Õ(1/T ) and minimizes the individual regret at a speed of O(1/T 3/4 ) in contrast to the O(1/ √ T ) convergence rate of existing stochastic algorithms. Due to the advantage of stochastic methods in the computational cost, this significantly improves the time complexity to approximate coarse correlated equilibrium in general games. Please see Table 1 for the comparison of the time complexity of our algorithm against prior works. Specifically, our result improves previous works for weak ϵ-CCE when Cost ≥ N A and for strong ϵ-CCE when Cost ≥ N A 2 /ϵ. To achieve the above regret bounds, we make two main technical contributions. Firstly, we extend the theoretical framework of analyzing regret bounds of stochastic OMD in (AM21) from two-player zero-sum games to general games. Secondly, we propose a novel low-variance Monte-Carlo estimator for general games. The computational complexity of this estimator is exponentially faster than Carmon et al. (CJST19) and Alacaoglu et al. (AM21) while the variance is only slightly larger. The stochastic OMD algorithm equipped with our novel estimator achieves the above results. The rest of the paper is organized as follows: In Section 2, we discuss prior works related to this problem. In Section 3, we provide necessary preliminaries for games, coarse correlated equilibrium and optimistic mirror descent. In Section 4, we introduce our algorithm and present a general regret upper bound in Theorem 1. In Section 5, we introduce our low-variance Monte-Carlo estimator and analyze its variance in Lemma 3. In Section 6, by combining the results in Theorem 1 and Lemma 3, we present our final regret bounds in Theorem 2 and 3, as well as the time complexity to approximate coarse correlated equilibrium in Corollary 1 and 2.

2. RELATED WORK

Comparisons to existing algorithms. Table 1 compares the time complexity of our algorithm to compute ϵ-coarse correlated equilibrium for general games (and ϵ-Nash equilibrium for two-player zero-sum games) against prior no-regret algorithms. The time complexity is determined by two terms: the convergence rate (or the regret) and the computational cost in each round. Deterministic algorithms (PSS21; DFG21; SALS15) converge fast, but with a relatively higher per round time complexity since they have to compute the loss in each round. Stochastic algorithms exploit the Monte-Carlo approach to accelerate the computation of loss. However, the variance of the estimated loss may slow down the convergence rate. To alleviate the effect of the variance, Carmon et al. Variance reduction. Variance reduction is one of the most useful techniques to accelerate stochastic algorithms (see (GSBR20) for a comprehensive survey). Typically, when optimizing the finite sum problem min x F (x) = N i=1 F i (x), instead of estimating the gradient by ∇F i (x) like Stochastic Gradient Descent (SGD), the variance reduction method proposes to use ∇F (x) = ∇F (w k ) +



(CJST19) and Alacaoglu et al. (AM21) develop variance reduced stochastic no-regret learning algorithms. Their algorithms significantly accelerate the computation of ϵ-Nash equilibrium in two-player zero-sum games.

To alleviate the effect of Monte-Carlo estimator's variance, Carmon et al. (CJST19) and Alacaoglu et al. (AM21) propose variance reduced stochastic no-regret algorithms with a convergence rate of O(1/T ) for two-player zero-sum games. As a result, they improve the time complexity of computing ϵ-Nash Equilibrium in two-player zero-sum games from O(Cost/ϵ) of deterministic algorithms to O(Cost + √ Cost/ϵ) (some lower order terms are omitted) where Cost is the time complexity of computing the loss vector. While Carmon et al. (CJST19) and Alacaoglu et al. (

