PROVABLE FICTITIOUS PLAY FOR GENERAL MEAN-FIELD GAMES

Abstract

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium. When viewing the mean-field state and the policy as two players, we propose a fictitious play algorithm which alternatively updates the mean-field state and the policy via gradient-descent and proximal policy optimization, respectively. Our algorithm is in stark contrast with previous literature which solves each single-agent reinforcement learning problem induced by the iterates mean-field states to the optimum. Furthermore, we prove that our fictitious play algorithm converges to the Nash equilibrium at a sublinear rate. To the best of our knowledge, this seems the first provably convergent reinforcement learning algorithm for mean-field games based on iterative updates of both mean-field state and policy.

1. INTRODUCTION

Multi-agent reinforcement learning (MARL) (Shoham et al., 2007; Busoniu et al., 2008; Hernandez-Leal et al., 2017; Hernandez-Leal et al.; Zhang et al., 2019) aims to tackle sequential decisionmaking problems in multi-agent systems (Wooldridge, 2009) by integrating the classical reinforcement learning framework (Sutton & Barto, 2018) with game-theoretical thinking (Başar & Olsder, 1998) . Powered by deep-learning (Goodfellow et al., 2016) , MARL recently has achieved striking empirical successes in games (Silver et al., 2016; 2017; Vinyals et al., 2019; Berner et al., 2019; Schrittwieser et al., 2019 ), robotics (Yang & Gu, 2004; Busoniu et al., 2006; Leottau et al., 2018 ), transportation (Kuyer et al., 2008; Mannion et al., 2016) , and social science (Leibo et al., 2017; Jaques et al., 2019; Cao et al., 2018; McKee et al., 2020) . Despite the empirical successes, MARL is known to suffer from the scalability issue. Specifically, in a multi-agent system, each agent interacts with the other agents as well as the environment, with the goal of maximizing its own expected total return. As a result, for each agent, the reward function and the transition kernel of its local state also involve the local states and actions of all the other agents. As a result, as the number of agents increases, the capacity of the joint state-action space grows exponentially, which brings tremendous difficulty to reinforcement learning algorithms due to the need to handle high-dimensional input spaces. Such a curse of dimensionality due to having a large number of agents in the system is named as the "curse of many agents" (Sonu et al., 2017) . To circumvent such a notorious curse, a popular approach is through mean-field approximation, which imposes symmetry among the agents and specifies that, for each agent, the joint effect of all the other agents is summarized by a population quantity, which is oftentimes given by the empirical distribution of the local states and actions of all the other agents or a functional of such an empirical distribution. Specifically, to obtain symmetry, the reward and local state transition functions are the same for each agent, which are functions of the local state-action and the population quantity. Thanks to mean-field approximation, such a multi-agent system, known as the mean-field game (MFG) (Huang et al., 2003; Lasry & Lions, 2006a; b; 2007; Huang et al., 2007; Guéant et al., 2011; Carmona & Delarue, 2018) , is readily scalable to an arbitrary number of agents. In this work, we aim to find the Nash equilibrium (Nash, 1950) of MFG with infinite number of agents via reinforcement learning. By mean-field approximation, such a game consists of a population of symmetric agents among which each individual agent has infinitesimal effect over the

