THE POWER OF REGULARIZATION IN SOLVING EXTENSIVE-FORM GAMES

Abstract

In this paper, we investigate the power of regularization, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast O(1/T ) last-iterate convergence in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (Reg-CFR), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve O(1/T 1/4 ) best-iterate, and O(1/T 3/4 ) averageiterate convergence rate for finding NE in EFGs. Finally, we show that Reg-CFR can achieve asymptotic last-iterate convergence, and optimal O(1/T ) averageiterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFRtype algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.

1. INTRODUCTION

Extensive-form games (EFGs) are widely used in modeling sequential decision-making of multiple agents with imperfect information. Many popular real-world multi-agent learning problems can be modeled as EFGs, including Poker (Brown and Sandholm, 2018; 2019b ), Scotland Yard (Schmid et al., 2021 ), Bridge (Tian et al., 2020 ), cloud computing (Kakkad et al., 2019 ), and auctions (Shubik, 1971) , etc. Despite the recent success of many of these applications, efficiently solving large-scale EFGs is still challenging. Solving EFGs typically refers to as finding a Nash equilibrium (NE) of the game, especially in the two-player zero-sum setting. In the past decades, the most popular methods in solving EFGs are arguably regret-minimization based methods, such as counterfactual regret minimization (CFR) (Zinkevich et al., 2007) and its variants (Tammelin et al., 2015; Brown and Sandholm, 2019a) . By controlling the regret of each player, the average of strategies constitute an approximated NE in two-player zero-sum games, which is called average-iterate convergence (Zinkevich et al., 2007; Tammelin et al., 2015; Farina et al., 2019a) . However, averaging the strategies can be undesirable, which not only incurs more computation (Bowling et al., 2015) (additional memory and computation for the average strategy), but also intro-♮ Alphabetical Order

