THE POWER OF REGULARIZATION IN SOLVING EXTENSIVE-FORM GAMES

Abstract

In this paper, we investigate the power of regularization, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast O(1/T ) last-iterate convergence in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (Reg-CFR), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve O(1/T 1/4 ) best-iterate, and O(1/T 3/4 ) averageiterate convergence rate for finding NE in EFGs. Finally, we show that Reg-CFR can achieve asymptotic last-iterate convergence, and optimal O(1/T ) averageiterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFRtype algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.

1. INTRODUCTION

Extensive-form games (EFGs) are widely used in modeling sequential decision-making of multiple agents with imperfect information. Many popular real-world multi-agent learning problems can be modeled as EFGs, including Poker (Brown and Sandholm, 2018; 2019b ), Scotland Yard (Schmid et al., 2021) , Bridge (Tian et al., 2020 ), cloud computing (Kakkad et al., 2019) , and auctions (Shubik, 1971) , etc. Despite the recent success of many of these applications, efficiently solving large-scale EFGs is still challenging. Solving EFGs typically refers to as finding a Nash equilibrium (NE) of the game, especially in the two-player zero-sum setting. In the past decades, the most popular methods in solving EFGs are arguably regret-minimization based methods, such as counterfactual regret minimization (CFR) (Zinkevich et al., 2007) and its variants (Tammelin et al., 2015; Brown and Sandholm, 2019a) . By controlling the regret of each player, the average of strategies constitute an approximated NE in two-player zero-sum games, which is called average-iterate convergence (Zinkevich et al., 2007; Tammelin et al., 2015; Farina et al., 2019a) . duces additional representation and optimization errors when function approximation is used. For example, when using neural networks to parameterize the strategies, the averaged strategy may not be able to be represented properly and the optimization object can be highly non-convex. Therefore, it is imperative to understand if (approximated) NE can be efficiently solved without average, which motivates the study of last-iterate convergence. In fact, the popular CFR-type algorithms mentioned above only enjoy average-iterate convergence guarantees so far (Zinkevich et al., 2007; Tammelin et al., 2015; Farina et al., 2019a) , and it is unclear if such a last-iterate convergence is achievable for this type of algorithms. The recent advances of Optimistic Mirror Descent (Rakhlin and Sridharan, 2013; Mertikopoulos et al., 2019; Wei et al., 2021; Cai et al., 2022) shed lights on how to achieve last-iterate convergence for solving normal-form games (NFGs), a strict sub-class of EFGs. The last-iterate convergence in EFGs has not received attention until recently (Bowling et al., 2015; Farina et al., 2019c; Lee et al., 2021) . Specifically, Bowling et al. (2015) provided some empirical evidence of last-iterate convergence for CFR-type algorithms, while Farina et al. (2019c) empirically proved that OMD enjoyed last-iterate convergence in EFGs. Lee et al. (2021) proposed an OMD variant with the first last-iterate convergence guarantees in EFGs, but the solution itself might have room for improvement: To make the update computationally efficient, the mirror map needs to be generated through a dilated operation (see §2 for more details); and for this case, the analysis in Lee et al. (2021) requires the NE to be unique. In particular, an important and arguably most well-studied instance of OMD for no-regret learning over simplex, i.e., the optimistic multiplicative weights update (OMWU) (Daskalakis and Panageas, 2019; Wei et al., 2021) , cannot be shown to have explicit lastiterate convergence rate so far , without such a uniqueness condition, even for normal-form games. Anagnostides et al. ( 2022) can only guarantee an asymptotic last-iterate convergence rate without uniqueness assumptionfoot_0 . Indeed, it is left as an open question in (Wei et al., 2021) if the uniqueness condition is necessary for OMWU to converge with an explicit rate for this strict sub-class of EFGs, when constant stepsize is used. In this paper, we remove the uniqueness condition, while establishing the last-iterate convergence for Dilated Optimistic Mirror Descent (DOMD) type methods. The solution relies on exploiting the power of the regularization techniques in EFGs. Our last-iterate convergence guarantee is not only for the convergence of duality gap, a common metric used in the literature, but also for the actual iterate, i.e., the convergence of the distance to the set of NE. This matches the bona fide last-iterate convergence studied in the literature, e.g., Daskalakis and Panageas (2019); Wei et al. (2021) , and such a kind of last-iterate guarantee is unknown when the mirror map is either dilated or entropybased. More importantly, the techniques we develop can also be applied to CFR, resulting in the first last-iterate convergence guarantee for CFR-type algorithms. We detail our contributions as follows. Contributions. Our contributions are mainly four-fold: (i) We develop a new type of dilated OMD algorithms, an efficient variant of OMD that exploits the structure of EFGs, with adaptive regularization (Reg-DOMD), and prove an explicit convergence rate of the duality gap, without the uniqueness assumption of the NE. (ii) We further establish a last-iterate convergence rate for dilated optimistic multiplicative weights update to the NE of EFGs (beyond the duality gap as in Cen et al. (2021b) , for the NFG setting), when constant stepsize is used. This also moves one step further towards solving the open question for the NFG setting, about whether the uniqueness assumption can be removed to prove last-iterate convergence of the authentic OMWU algorithms with constant stepsizes (Daskalakis and Panageas, 2019; Wei et al., 2021) . (iii) For CFR-type algorithms, using the regularization technique, we establish the first best-iterate convergence rate of O(1/T 1/4 ) for finding the NE of non-perturbed EFGs, and last-iterate asymptotic convergence for finding the NE of perturbed EFGs in terms of duality gap, which is useful for finding approximate extensive-form perfect equilibrium (EFPE) (Selten, 1975) . (iv) As a by-product of our analysis, we also provide a faster and optimal rate of O(1/T ) average-iterate convergence guarantee in finding NE of perturbed EFGs (see formal definition in §4.1), while also matching the state-of-the-art guarantees for CFR-type algorithms in finding NE for the non-perturbed EFGs in terms of duality gap (Farina et al., 2019a) . Technical challenges. We emphasize the technical challenges we address as follows. First, by adding regularization to the original problem, Reg-DOMD will converge to the NE of the regularized



A recent result (Anagnostides et al., 2022, Theorem 3.4) also gave a best-iterate convergence result with rate, but only asymptotic convergence result for the last iterate.

