CAUSAL MEAN FIELD MULTI-AGENT REINFORCE-MENT LEARNING

Abstract

Scalability remains a challenge in multi-agent reinforcement learning and is currently under active research. A framework named mean-field reinforcement learning (MFRL) could alleviate the scalability problem by employing Mean Field Theory to turn a many-agent problem into a two-agent problem. However, this framework lacks the ability to identify essential interactions under non-stationary environments. Causality contains relatively invariant mechanisms behind interactions, though environments are non-stationary. Therefore, we propose an algorithm called causal mean-field Q-learning (CMFQ) to address the scalability problem. CMFQ is ever more robust toward the change of the number of agents though inheriting the compressed representation of MFRL's action-state space. Firstly, we model the causality behind the decision-making process of MFRL into a structural causal model (SCM). Then the essential degree of each interaction is quantified via intervening on the SCM. Furthermore, we design the causality-aware compact representation for behavioral information of agents as the weighted sum of all behavioral information according to their causal effects. We test CMFQ in a mixed cooperative-competitive game and a cooperative game. The result shows that our method has excellent scalability performance in both training in environments containing a large number of agents and testing in environments containing much more agents.

1. INTRODUCTION

multi-agent reinforcement learning (MARL) has achieved remarkable success in some challenging tasks. e.g., video games (Vinyals et al., 2019; Wu, 2019) . However, training a large number of agents remains a challenge in MARL. The main reasons are 1) the dimensionality of joint state-action space increases exponentially as agent number increases, and 2) during the training for a single agent, the policies of other agents keep changing, causing the non-stationarity problem, whose severity increases as agent number increases. (Sycara, 1998; Zhang et al., 2019; Gronauer & Diepold, 2021) . Existing works generally use the centralized training and decentralized execution paradigm to mitigate the scalability problem via mitigating the non-stationarity problem (Rashid et al., 2018; Foerster et al., 2018; Lowe et al., 2017; Sunehag et al., 2017) . Curriculum learning and attention techniques are also used to improve the scalability performance (Long et al., 2020; Iqbal & Sha, 2019) . However, above methods focus mostly on tens of agents. For large-scale multi-agent system (MAS) contains hundreds of agents, studies in game theory (Blume, 1993) and mean-field theory (Stanley, 1971; Yang et al., 2018) offers a feasible framework to mitigate the scalability problem. Under this framework, Yang et al. (2018) propose a algorithm called mean-field Q-learning (MFQ), which replaces joint action in joint Q-function with average action, assuming that the entire agent-wise interactions could be simplified into the mean of local pairwise interactions. That is, MFQ reduces the dimensionality of joint state-action space with a merged agent. However, this approach ignores the importance differences of the pairwise interactions, resulting in the poor robustness. Nevertheless, one of the drawbacks to mean field theory is that it does not properly account for fluctuations when few interactions exist(Uzunov, 1993) (e.g., the average action may change drastically if there are only two adjacent agents). Wang et al. (2022) attempt to improve the representational ability of the merged agent by assign weight to each pairwise interaction by its attention score. However, the observations of other agents are needed as input, making this method not practical enough in the real world. In addition, the attention score is essentially a correlation in feature space, which seems

