LEARNING RATIONALIZABLE EQUILIBRIA IN MULTIPLAYER GAMES

Abstract

A natural goal in multi-agent learning is to learn rationalizable behavior, where players learn to avoid any Iteratively Dominated Action (IDA). However, standard no-regret based equilibria-finding algorithms could take exponential samples to find such rationalizable strategies. In this paper, we first propose a simple yet sample-efficient algorithm for finding a rationalizable action profile in multi-player general-sum games under bandit feedback, which substantially improves over the results of Wu et al. (2021). We further develop algorithms with the first efficient guarantees for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE). Our algorithms incorporate several novel techniques to guarantee the elimination of IDA and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates, which may be of independent interest. We complement our results with a sample complexity lower bound showing the sharpness of our guarantees.

1. INTRODUCTION

A common objective in multi-agent learning is to find various equilibria, such as Nash equilibria (NE), correlated equilibria (CE) and coarse correlated equilibria (CCE). Generally speaking, a player in equilibrium lacks incentive to deviate assuming conformity of other players to the same equilibrium. Equilibrium learning has been extensively studied in the literature of game theory and online learning, and no-regret based learners can provably learn approximate CE and CCE with both computational and statistical efficiency (Stoltz, 2005; Cesa-Bianchi & Lugosi, 2006) . However, not all equilibria are created equal. As shown by Viossat & Zapechelnyuk (2013) , a CCE can be entirely supported on dominated actions-actions that are worse off than some other strategy in all circumstances-which rational agents should apparently never play. Approximate CE also suffers from a similar problem. As shown by Wu et al. (2021, Theorem 1) , there are examples where an ϵ-CE always plays iteratively dominated actions-actions that would be eliminated when iteratively deleting strictly dominated actions-unless ϵ is exponentially small. It is also shown that standard no-regret algorithms are indeed prone to finding such seemingly undesirable solutions (Wu et al., 2021) . The intrinsic reason behind this is that CCE and approximate CE may not be rationalizable, and existing algorithms can indeed fail to find rationalizable solutions. Different from equilibria notions, rationalizability (Bernheim, 1984; Pearce, 1984) looks at the game from the perspective of a single player without knowledge of the actual strategies of other players, and only assumes common knowledge of their rationality. A rationalizable strategy will avoid strictly dominated actions, and assuming other players have also eliminated their dominated actions, iteratively avoid strictly dominated actions in the subgame. Rationalizability is a central solution concept in game theory (Osborne & Rubinstein, 1994) and has found applications in auctions (Battigalli & Siniscalchi, 2003) and mechanism design (Bergemann et al., 2011) . If an (approximate) equilibrium only employs rationalizable actions, it would prevent irrational behavior such as playing dominated actions. Such equilibria are arguably more reasonable than unrationalizable ones, and constitute a stronger solution concept. This motivates us to consider the following open question: Can we efficiently learn equilibria that are also rationalizable? Despite its fundamental role in multi-agent reasoning, rationalizability is rarely studied from a learning perspective until recently, with Wu et al. (2021) giving the first algorithm for learning rationalizable strategies from bandit feedback. However, the problem of learning rationalizable CE and CCE remains a challenging open problem. Due to the existence of unrationalizable equilibria, running standard CE or CCE learners will not guarantee rationalizable solutions. On the other hand, one cannot hope to first identify all rationalizable actions and then find an equilibrium on the subgame, since even determining whether an action is rationalizable requires exponentially many samples (see Proposition 2). Therefore, achieving rationalizability and approximate equilibria simultaneously is nontrivial and presents new algorithmic challenges. In this work, we address the challenges above and give a positive answer to our main question. Our contributions can be summarized as follows: • As a first step, we provide a simple yet sample-efficient algorithm for identifying a ∆rationalizablefoot_0 action profile under bandit feedback, using only O LN A ∆ 2 foot_1 samples in normalform games with N players, A actions per player and a minimum elimination length of L. This greatly improves the result of Wu et al. ( 2021) and is tight up to logarithmic factors when L = O(1). • Using the above algorithm as a subroutine, we develop exponential weights based algorithms that can provably find ∆-rationalizable ϵ-CCE using O LN A ∆ 2 + N A ϵ 2 samples, and ∆-rationalizable ϵ-CE using O LN A ∆ 2 + N A 2 min{ϵ 2 ,∆ 2 } samples. To the best of our knowledge, these are the first guarantees for learning rationalizable approximate CCE and CE. • We also provide reduction schemes that find ∆-rationalizable ϵ-CCE/CE using black-box algorithms for ϵ-CCE/CE. Despite having slightly worse rates, these algorithms can directly leverage the progress in equilibria finding, which may be of independent interest.

1.1. RELATED WORK

Rationalizability and iterative dominance elimination. Rationalizability (Bernheim, 1984; Pearce, 1984) is a notion that captures rational reasoning in games and relaxes Nash Equilibrium. Rationalizability is closely related to the iterative elimination of dominated actions, which has been a focus of game theory research since the 1950s (Luce & Raiffa, 1957) . It can be shown that an action is rationalizable if and only if it survives iterative elimination of strictly dominated actionsfoot_2 (Pearce, 1984) . There is also experimental evidence supporting iterative elimination of dominated strategies as a model of human reasoning (Camerer, 2011) .

Equilibria learning in games.

There is a rich literature on applying online learning algorithms to learning equilibria in games. It is well-known that if all agents have no-regret, the resulting empirical average would be an ϵ-CCE (Young, 2004), while if all agents have no swap-regret, the resulting empirical average would be an ϵ- & Panageas, 2018; Wei et al., 2020) , and extension to extensive-form games (Celli et al., 2020; Bai et al., 2022b; a; Song et al., 2022) and Markov games (Song et al., 2021; Jin et al., 2021) . Computational and learning aspect of rationalizability. Despite its conceptual importance, rationalizability and iterative dominance elimination are not well studied from a computational or learning perspective. For iterative strict dominance elimination in two-player games, Knuth et al. (1988) provided a cubic-time algorithm and proved that the problem is P-complete. The weak dominance version of the problem is proven to be NP-complete by Conitzer & Sandholm (2005) .



An action is ∆-rationalizable if it survives iterative elimination of ∆-dominated actions; c.f. Definition 1. Throughout this paper, we use O to suppress logarithmic factors in N , A, L, 1 ∆ , 1 δ , and 1 ϵ . For this equivalence to hold, we need to allow dominance by mixed strategies, and correlated beliefs when there are more than two players. These conditions are met in the setting of this work.



CE (Hart & Mas-Colell, 2000; Cesa-Bianchi & Lugosi, 2006). Later work continuing this line of research include those with faster convergence rates(Syrgkanis et al.,  2015; Chen & Peng, 2020; Daskalakis et al., 2021), last-iterate convergence guarantees (Daskalakis

