VERY LARGE SCALE MULTI-AGENT REINFORCEMENT LEARNING WITH GRAPH ATTENTION MEAN FIELD

Abstract

With recent advances in reinforcement learning, we have witnessed countless successes of intelligent agents in various domains. Especially, multi-agent reinforcement learning (MARL) is suitable for many real-world scenarios and has vast potential applications. However, typical MARL methods can only handle tens of agents, leaving scenarios with up to hundreds or even thousands of agents almost unexplored. There exist two key challenges in scaling up the number of agents: (1) agent-agent interactions are critical in multi-agent systems while the number of interactions grows quadratically with the number of agents, causing great computational complexity and difficulty in strategies-learning; (2) the strengths of interactions vary among agents and over time, making it difficult to precisely model such interactions. In this paper, we propose the Graph Attention Mean Field (GAT-MF) method, where we convert agent-agent interactions into interactions between each agent and a weighted mean field, greatly reducing the computational complexity. We mathematically prove the correctness of this conversion. We design a graph attention mechanism to automatically capture the different and time-varying strengths of interactions, ensuring the ability of our method to precisely model interactions among the agents. We conduct extensive experiments in both manual and real-world scenarios with up to more than 3000 agents, demonstrating that comparing existing MARL methods, our method reaches superior performance and 9.4 times computational efficiency.

1. INTRODUCTION

In recent years, rapid progress in reinforcement learning (RL) has largely facilitated humans' decision-making in complex situations. In various domains such as game playing (Silver et al., 2017; Ye et al., 2021 ), robotics (Sinha et al., 2022; Brunke et al., 2022 ), public health (Bastani et al., 2021; Hao et al., 2021; 2022) , and even nuclear fusion system (Degrave et al., 2022) , RL algorithms and applications keep emerging. Most of the successful RL works focus on single-agent scenarios, while in the real world, it is normal that a system consists of multiple agents, and interactions among the agents are of vital importance. Therefore, multi-agent reinforcement learning (MARL) has especially wide applications and developments in corresponding methods are called for. In fact, previous researchers have done plentiful works on MARL. For example, MADDPG (Lowe et al., 2017) outperforms single-agent DDPG in experiments with various goals, a series of studies such as QMIX (Rashid et al., 2018 ), AlphaStar (Vinyals et al., 2019) and MAPPO (Yu et al., 2021) keep surpassing human professional players and refreshing the score on the Starcraft game, which is a typical MARL benchmark. Also, there are various social and industrial applications of MARL methods in traffic signal control (Wang et al., 2021b) , power distribution management (Wang et al., 2021a ), cloud computing (Balla et al., 2021) , etc. However, existing studies typically only consider no more than tens of agents, while methods dealing with scenarios with up to hundreds or even thousands of agents remain almost unexplored. Despite scenarios with hundreds or thousands of agents are common in the real world, there exist two key challenges in scaling up the number of agents in MARL methods. (1) Large number of agent-agent interactions. In multi-agent systems, the agents naturally interact with each other all the time, and such interactions are critical for the system dynamic. Therefore, besides simple agent-environment interactions, MARL methods must take the agent-agent interactions into con-

