IMITATION LEARNING FOR MEAN FIELD GAMES WITH CORRELATED EQUILIBRIA

Abstract

Imitation learning (IL) aims at achieving optimal actions by learning from demonstrated behaviors without knowing the reward function and transition kernels. Conducting IL with a large population of agents is challenging as agents' interactions grow exponentially with respect to the population size. Mean field theory provides an efficient tool to study multi-agent problems by aggregating information on the population level. While the approximation is tractable, it is non-trivial to restore mean field Nash equilibria (MFNE) from demonstrations. Importantly, there are many real-world problems that cannot be explained by the classic MFNE concept; this includes the traffic network equilibrium induced from the public routing recommendations and the pricing equilibrium of goods generated on the E-commerce platform. In both examples, correlation devices are introduced to the equilibrium due to the intervention from the platform. To accommodate this, we propose a novel solution concept named adaptive mean field correlated equilibrium (AMFCE) that generalizes MFNE. On the theory side, we first prove the existence of AMFCE, and establish a novel framework based on IL and AMFCE with entropy regularization (MaxEnt-AMFCE) to recover the AMFCE policy from real-world demonstrations. Signatures from the rough path theory are then applied to characterize the mean-field evolution. A significant benefit of our framework is that it can recover both the equilibrium policy and the correlation device from data. We test our framework against the state-of-the-art IL algorithms for MFGs on several tasks (including a real-world traffic flow prediction problem), results justify the effectiveness of our proposed method and show its potential to predicting and explaining large population behavior under correlated signals.

1. INTRODUTION

Imitation learning (IL) (Hussein et al., 2017) has been widely adopted to learn the desired behavior through expert demonstrations and led to a series of impressive successes (Silver et al., 2016; Shi et al., 2019; Shang et al., 2019) . Existing imitation learning algorithms cannot handle tasks with a large group of agents due to the curse of dimensionality and the exponential growth of agent interactions when the number of agents increases. However, many real-world scenarios require the algorithm to handle a large population. Examples include traffic management and control (Bazzan, 2009), Ad auction (Guo et al., 2019) , online business with a large customer base (Ahn et al., 2007) and social behaviors between game bots and humans (Jeong et al., 2015) . For systems with a large population of homogeneous agents, mean field theory provides a practically efficient and analytically feasible approach to analyze the otherwise challenging multi-agent games (Guo et al., 2019; Yang et al., 2018b) . In the mean field game (MFG) setting, the states of the entire population can be sufficiently summarized into an empirical distribution of states thanks to the homogeneity property. Therefore it suffices to consider a game between a representative agent and an empirical distribution. Existing (and rather limited) literature on mean-field IL assumes that the expert demonstrations are sampled from the classic mean field Nash equilibrium (MFNE) (Yang et al., 2018a; Chen et al., 2022) . The limitation of this framework is not general enough to capture many real-world situations where external and correlated signals influence the behavior of the entire populations. Examples include the behavior of drivers on the traffic network with routing recommendations from Google Map or Apple Map. Another possible example is the E-commerce platform recommendation for individual 1

