LEARNING RATIONALIZABLE EQUILIBRIA IN MULTIPLAYER GAMES

Abstract

A natural goal in multi-agent learning is to learn rationalizable behavior, where players learn to avoid any Iteratively Dominated Action (IDA). However, standard no-regret based equilibria-finding algorithms could take exponential samples to find such rationalizable strategies. In this paper, we first propose a simple yet sample-efficient algorithm for finding a rationalizable action profile in multi-player general-sum games under bandit feedback, which substantially improves over the results of Wu et al. (2021). We further develop algorithms with the first efficient guarantees for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE). Our algorithms incorporate several novel techniques to guarantee the elimination of IDA and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates, which may be of independent interest. We complement our results with a sample complexity lower bound showing the sharpness of our guarantees.

1. INTRODUCTION

A common objective in multi-agent learning is to find various equilibria, such as Nash equilibria (NE), correlated equilibria (CE) and coarse correlated equilibria (CCE). Generally speaking, a player in equilibrium lacks incentive to deviate assuming conformity of other players to the same equilibrium. Equilibrium learning has been extensively studied in the literature of game theory and online learning, and no-regret based learners can provably learn approximate CE and CCE with both computational and statistical efficiency (Stoltz, 2005; Cesa-Bianchi & Lugosi, 2006) . However, not all equilibria are created equal. As shown by Viossat & Zapechelnyuk (2013) , a CCE can be entirely supported on dominated actions-actions that are worse off than some other strategy in all circumstances-which rational agents should apparently never play. Approximate CE also suffers from a similar problem. As shown by Wu et al. (2021, Theorem 1), there are examples where an ϵ-CE always plays iteratively dominated actions-actions that would be eliminated when iteratively deleting strictly dominated actions-unless ϵ is exponentially small. It is also shown that standard no-regret algorithms are indeed prone to finding such seemingly undesirable solutions (Wu et al., 2021) . The intrinsic reason behind this is that CCE and approximate CE may not be rationalizable, and existing algorithms can indeed fail to find rationalizable solutions. Different from equilibria notions, rationalizability (Bernheim, 1984; Pearce, 1984) looks at the game from the perspective of a single player without knowledge of the actual strategies of other players, and only assumes common knowledge of their rationality. A rationalizable strategy will avoid strictly dominated actions, and assuming other players have also eliminated their dominated actions, iteratively avoid strictly dominated actions in the subgame. Rationalizability is a central solution concept in game theory (Osborne & Rubinstein, 1994) and has found applications in auctions (Battigalli & Siniscalchi, 2003) and mechanism design (Bergemann et al., 2011) . If an (approximate) equilibrium only employs rationalizable actions, it would prevent irrational behavior such as playing dominated actions. Such equilibria are arguably more reasonable than

