VERY LARGE SCALE MULTI-AGENT REINFORCEMENT LEARNING WITH GRAPH ATTENTION MEAN FIELD

Abstract

With recent advances in reinforcement learning, we have witnessed countless successes of intelligent agents in various domains. Especially, multi-agent reinforcement learning (MARL) is suitable for many real-world scenarios and has vast potential applications. However, typical MARL methods can only handle tens of agents, leaving scenarios with up to hundreds or even thousands of agents almost unexplored. There exist two key challenges in scaling up the number of agents: (1) agent-agent interactions are critical in multi-agent systems while the number of interactions grows quadratically with the number of agents, causing great computational complexity and difficulty in strategies-learning; (2) the strengths of interactions vary among agents and over time, making it difficult to precisely model such interactions. In this paper, we propose the Graph Attention Mean Field (GAT-MF) method, where we convert agent-agent interactions into interactions between each agent and a weighted mean field, greatly reducing the computational complexity. We mathematically prove the correctness of this conversion. We design a graph attention mechanism to automatically capture the different and time-varying strengths of interactions, ensuring the ability of our method to precisely model interactions among the agents. We conduct extensive experiments in both manual and real-world scenarios with up to more than 3000 agents, demonstrating that comparing existing MARL methods, our method reaches superior performance and 9.4 times computational efficiency.

1. INTRODUCTION

In recent years, rapid progress in reinforcement learning (RL) has largely facilitated humans' decision-making in complex situations. In various domains such as game playing (Silver et al., 2017; Ye et al., 2021) , robotics (Sinha et al., 2022; Brunke et al., 2022 ), public health (Bastani et al., 2021; Hao et al., 2021; 2022) , and even nuclear fusion system (Degrave et al., 2022) , RL algorithms and applications keep emerging. Most of the successful RL works focus on single-agent scenarios, while in the real world, it is normal that a system consists of multiple agents, and interactions among the agents are of vital importance. Therefore, multi-agent reinforcement learning (MARL) has especially wide applications and developments in corresponding methods are called for. In fact, previous researchers have done plentiful works on MARL. For example, MADDPG (Lowe et al., 2017) outperforms single-agent DDPG in experiments with various goals, a series of studies such as QMIX (Rashid et al., 2018 ), AlphaStar (Vinyals et al., 2019) and MAPPO (Yu et al., 2021) keep surpassing human professional players and refreshing the score on the Starcraft game, which is a typical MARL benchmark. Also, there are various social and industrial applications of MARL methods in traffic signal control (Wang et al., 2021b) , power distribution management (Wang et al., 2021a ), cloud computing (Balla et al., 2021) , etc. However, existing studies typically only consider no more than tens of agents, while methods dealing with scenarios with up to hundreds or even thousands of agents remain almost unexplored. Despite scenarios with hundreds or thousands of agents are common in the real world, there exist two key challenges in scaling up the number of agents in MARL methods. (1) Large number of agent-agent interactions. In multi-agent systems, the agents naturally interact with each other all the time, and such interactions are critical for the system dynamic. Therefore, besides simple agent-environment interactions, MARL methods must take the agent-agent interactions into con-sideration to reach good performance. However, the number of agent-agent interactions increases quadratically following O(N 2 ) as the number of agents grows to N , which greatly adds to the computational complexity and difficulty for the agents to learn efficient strategies. (2) Varying strengths of agent-agent interactions. Due to the intrinsic dynamism of real-world scenarios, the strengths of interactions not only vary among each pair of agents but also vary over time, making it difficult to precisely model all the agent-agent interactions. If we manually set the interaction strength of each pair of agents according to prior knowledge of a certain scenario, it requires repeated manual work when we want to train models to solve problems in different scenarios. On the other hand, it is almost impossible to do such manual work when the number of agents is large. In view of these challenges, we propose Graph Attention Mean Field (GAT-MF) method to largely scale up the number of agents in MARL. First, to solve the problem of the unaffordably large number of agent-agent interactions, we develop the previous study of unweighted Mean Field (Yang et al., 2018 ) into a weighted one. We prove the mathematical correctness of converting interactions among the agents into the interactions between each agent and a corresponding field, which is obtained through a weighted average over the raw agent-agent interactions. By such conversion, the number of agent-field interactions only increases linearly following O(N ) with N agents, which greatly reduces the computational complexity and difficulty in learning efficient strategies. Also, such conversion keeps the information of different interaction strengths among agents in the weights, which is discarded in the unweighted mean field. Second, to automatically capture the varying strengths of interactions, i.e., the weights in calculating the equivalent field, we model the relations among the agents into a graph where each node represents one agent. We design a graph attention mechanism to dynamically learn and calculate the different and time-varying interaction strengths among the agents, which requires neither prior knowledge of the system nor manual work. Third, we evaluate our GAT-MF method in (1) a grid-world manual scenario with 100 agents (see Section 5) and ( 2) a real-world metropolitan scenario with more than 3000 agents, which is built according to real-world data (see Section 6). The results show that the proposed method outperforms the existing MARL methods in both scenarios and prove its ability in scaling up to scenarios with a large number of agents. Besides, the results also indicate that our method reaches high computational efficiency, taking only 41.5% GPU memory and reaching up to 9.4 times training speed. In summary, the main contributions of this work include: • We prove the mathematical correctness of converting agent-agent interactions in multiagent systems into the interactions between the agent and a weighted mean field. By doing so, we greatly reduce the computational complexity and difficulty in learning efficient strategies and make it possible to scale up to scenarios with a very large number of agents. • We design a graph attention mechanism to automatically capture the varying strengths of the agent-agent interactions, ensuring that our method can precisely model these interactions without prior knowledge of the strengths of the interactions in the scenario. • We conduct extensive experiments in both a manual grid-world scenario and a real-world metropolitan scenario with up to more than 3000 agents. The results demonstrate that comparing the typical MARL methods, our method achieves superior performance in both scenarios and obtains 9.4 times computational efficiency.

2. RELATED WORKS

Large Scale Task with RL. Many previous studies focus on solving large-scale tasks with RL. One common approach is to aggregate the large number of natural units into a relatively small number of clusters and control each cluster with one agent (Wang et al., 2021b; Qiu et al., 2021; Hao et al., 2022) . Another kind of methods decompose the raw vast action space into a hierarchical one according to prior knowledge of the targeted scenario, simplifying the decision-making process (Hao et al., 2021; Ma et al., 2021; Ren et al., 2021) . Other researchers combine human experts' solutions to aid the RL agent to learn more efficient strategies in large-scale scenarios (Qu et al., 2019; Li et al., 2022; Hao et al., 2022) . Although these works achieve success in their targeted scenarios, the manual techniques of unit aggregation, action space decomposition, or experts' solution collection are largely problem specific and require strong prior knowledge of the scenarios. In contrast, our method provides a direct MARL approach, requiring neither manual techniques nor prior knowledge, and thus is able to easily train models for solving problems in different scenarios.

