IMITATION LEARNING FOR MEAN FIELD GAMES WITH CORRELATED EQUILIBRIA

Abstract

Imitation learning (IL) aims at achieving optimal actions by learning from demonstrated behaviors without knowing the reward function and transition kernels. Conducting IL with a large population of agents is challenging as agents' interactions grow exponentially with respect to the population size. Mean field theory provides an efficient tool to study multi-agent problems by aggregating information on the population level. While the approximation is tractable, it is non-trivial to restore mean field Nash equilibria (MFNE) from demonstrations. Importantly, there are many real-world problems that cannot be explained by the classic MFNE concept; this includes the traffic network equilibrium induced from the public routing recommendations and the pricing equilibrium of goods generated on the E-commerce platform. In both examples, correlation devices are introduced to the equilibrium due to the intervention from the platform. To accommodate this, we propose a novel solution concept named adaptive mean field correlated equilibrium (AMFCE) that generalizes MFNE. On the theory side, we first prove the existence of AMFCE, and establish a novel framework based on IL and AMFCE with entropy regularization (MaxEnt-AMFCE) to recover the AMFCE policy from real-world demonstrations. Signatures from the rough path theory are then applied to characterize the mean-field evolution. A significant benefit of our framework is that it can recover both the equilibrium policy and the correlation device from data. We test our framework against the state-of-the-art IL algorithms for MFGs on several tasks (including a real-world traffic flow prediction problem), results justify the effectiveness of our proposed method and show its potential to predicting and explaining large population behavior under correlated signals.

1. INTRODUTION

Imitation learning (IL) (Hussein et al., 2017) has been widely adopted to learn the desired behavior through expert demonstrations and led to a series of impressive successes (Silver et al., 2016; Shi et al., 2019; Shang et al., 2019) . Existing imitation learning algorithms cannot handle tasks with a large group of agents due to the curse of dimensionality and the exponential growth of agent interactions when the number of agents increases. However, many real-world scenarios require the algorithm to handle a large population. Examples include traffic management and control (Bazzan, 2009) , Ad auction (Guo et al., 2019) , online business with a large customer base (Ahn et al., 2007) and social behaviors between game bots and humans (Jeong et al., 2015) . For systems with a large population of homogeneous agents, mean field theory provides a practically efficient and analytically feasible approach to analyze the otherwise challenging multi-agent games (Guo et al., 2019; Yang et al., 2018b) . In the mean field game (MFG) setting, the states of the entire population can be sufficiently summarized into an empirical distribution of states thanks to the homogeneity property. Therefore it suffices to consider a game between a representative agent and an empirical distribution. Existing (and rather limited) literature on mean-field IL assumes that the expert demonstrations are sampled from the classic mean field Nash equilibrium (MFNE) (Yang et al., 2018a; Chen et al., 2022) . The limitation of this framework is not general enough to capture many real-world situations where external and correlated signals influence the behavior of the entire populations. Examples include the behavior of drivers on the traffic network with routing recommendations from Google Map or Apple Map. Another possible example is the E-commerce platform recommendation for individual sellers on setting up the price for their products. In these two examples, a mediator or a coordinator recommends decisions but individual agents who seek for greedy decisions could deviate from the recommendation if she/he finds a better option given the available information. The existence of the mediator introduces correlations among the behaviors of individual agents. Therefore, a more general equilibrium concept is needed before we take a step further to learn from expert demonstrations. Inspired by the concept of correlated equilibrium (CE) for stateless game (Aumann, 1974) , there are recent developments on mean field correlated equilibrium (MFCE) with state dynamics. Campi and Fischer assume that a mediator recommends the same stochastic policy to the entire population, resulting in a limited equilibrium set which is the same as the classic MFNE (Campi & Fischer, 2022) . In addition, it is often more practical for the mediator to recommend an action rather than a stochastic policy to individuals (see the traffic routing and e-commerce examples). Muller et al. assume that the mediator recommends a time-independent and deterministic policy (sampled from some distribution over the deterministic policy space) to each individual (Muller et al., 2022) . This formulation is also rather limited in terms of describing the behaviors of many real-world applications and enabling sufficient flexibility of the population behavior. A more general and practical setting is to establish a framework where the mediator could sample a stochastic policy based on some time-dependent signals and recommend action for each individual, which is the exact framework investigated in this paper. (See Appendix H for a concrete example to show that our equilibrium concept is more general than the one proposed by Muller et al. (Muller et al., 2022) .) Given the above mentioned limitations of current existing MFCE concepts and mean-field IL approaches, we propose a new MFCE framework named adaptive mean field correlated equilibrium (AMFCE) with time-dependent correlated signals and an individual agent can adaptively update her belief on the unobserved correlated signal. We develop a method to recover AMFCE policy based on Maximum Entropy Regularization. Our framework has the following important and novel ingredients: • Novel MFCE concept with time-dependent correlated signals and adaptive belief updates from individual agents. In this paper, we propose a new MFCE framework (called AMFCE) that the mediator recommends an action sampled from a stochastic policy for each agent at every time step. This is a more general and flexible framework compared to previous works on the MFCE (Muller et al., 2022; Campi & Fischer, 2022) . We prove the existence of AMFCE solution under mild conditions and prove that MFNE is a subclass of AMFCE. • Entropy Regularization to overcome the equilibrium selection difficulty. Most of the IL algorithms for games face the equilibrium selection issue or identifiability issue as there often exist multiple equilibria. To bypass this difficulty, we further propose an entropy regularized AMFCE (MaxEnt-AMFCE) framework which is shown to have a unique solution. • Using signatures from rough path theory to efficiently represent mean-field evolution. Mean field information is often inaccessible in practice and it is computationally expensive to approximate the mean field information by its empirical distribution. To overcome this difficulty, we adopt signatures from the rough path theory to represent the mean-field evolution, which can be easily combined with neural network training architectures and the resulting method is computationally efficient. With all these ingredients, our correlated mean field imitation learning (CMFIL) framework can recover not only the policy but also the correlation device, which is the distribution that the correlated signal is sampled from. To the best of our knowledge, this paper is the first focusing on MFCE with the correlation device providing time-dependent recommendations and allowing adaptive belief updates for individual agents. In addition, we illustrate the performance of our framework by comparing the state-of-the-art imitation learning algorithms for MFGs on several tasks, including a real-world traffic flow prediction problem. The experimental results demonstrate that our framework outperforms the baseline in all tasks. As a by-product, our framework is also suitable for solving MFNE as MFNE is a subclass of AMFCE.

