POPULATION-SIZE-AWARE POLICY OPTIMIZATION FOR MEAN-FIELD GAMES

Abstract

In this work, we attempt to bridge the two fields of finite-agent and infinite-agent games, by studying how the optimal policies of agents evolve with the number of agents (population size) in mean-field games, an agent-centric perspective in contrast to the existing works focusing typically on the convergence of the empirical distribution of the population. To this end, the premise is to obtain the optimal policies of a set of finite-agent games with different population sizes. However, either deriving the closed-form solution for each game is theoretically intractable, training a distinct policy for each game is computationally intensive, or directly applying the policy trained in a game to other games is sub-optimal. We address these challenges through the Population-size-Aware Policy Optimization (PAPO). Our contributions are three-fold. First, to efficiently generate efficient policies for games with different population sizes, we propose PAPO, which unifies two natural options (augmentation and hypernetwork) and achieves significantly better performance. PAPO consists of three components: i) the population-size encoding which transforms the original value of population size to an equivalent encoding to avoid training collapse, ii) a hypernetwork to generate a distinct policy for each game conditioned on the population size, and iii) the population size as an additional input to the generated policy. Next, we construct a multi-task-based training procedure to efficiently train the neural networks of PAPO by sampling data from multiple games with different population sizes. Finally, extensive experiments on multiple environments show the significant superiority of PAPO over baselines, and the analysis of the evolution of the generated policies further deepens our understanding of the two fields of finite-agent and infinite-agent games.



1 INTRODUCTION 3RSXODWLRQ6L]H $SSUR[1DVK&RQY ( ) 3321DLYH 332 3$322XUV Games involving a finite number of agents have been extensively investigated, ranging from board games such as Go (Silver et al., 2016; 2018 ), Poker (Brown & Sandholm, 2018; 2019; Moravčík et al., 2017), and Chess (Campbell et al., 2002) to real-time strategy games such as StarCraft II (Vinyals et al., 2019) and Dota 2 (Berner et al., 2019) . However, existing works are typically limited to a handful of agents, which hinders them from broader applications. To break the curse of many agents (Wang et al., 2020) , mean-field game (MFG) (Huang et al., 2006; Lasry & Lions, 2007) was introduced to study the games that involve an infinite number of agents. Recently, benefiting from reinforcement learning (RL) (Sutton & Barto, 2018) and deep RL (Lillicrap et al., 2016; Mnih et al., 2015) , MFG provides a versatile framework to model games with large population of agents (Cui & Koeppl, 2022; Fu et al., 2019; Guo et al., 2019; Laurière et al., 2022; Perolat et al., 2021; Perrin et al., 2022; 2020; Yang et al., 2018) .



Figure 1: Experiments on Taxi Matching environment show the failure of two naive methods and the success of our PAPO. ↓ means the lower the better performance. See Sec. 5.1 for details.

