BI-LEVEL DYNAMIC PARAMETER SHARING AMONG INDIVIDUALS AND TEAMS FOR PROMOTING COL-LABORATIONS IN MULTI-AGENT REINFORCEMENT LEARNING

Abstract

Parameter sharing has greatly contributed to the success of multi-agent reinforcement learning in recent years. However, most existing parameter sharing mechanisms are static, and parameters are indiscriminately shared among individuals, ignoring the dynamic environments and different roles of multiple agents. In addition, although a single-level selective parameter sharing mechanism can promote the diversity of strategies, it is hard to establish complementary and cooperative relationships between agents. To address these issues, we propose a bi-level dynamic parameter sharing mechanism among individuals and teams for promoting effective collaborations (BDPS). Specifically, at the individual level, we define virtual dynamic roles based on the long-term cumulative advantages of agents and share parameters among agents in the same role. At the team level, we combine agents of different virtual roles and share parameters of agents in the same group. Through the joint efforts of these two levels, we achieve a dynamic balance between the individuality and commonality of agents, enabling agents to learn more complex and complementary collaborative relationships. We evaluate BDPS on a challenging set of StarCraft II micromanagement tasks. The experimental results show that our method outperforms the current state-of-the-art baselines, and we demonstrate the reliability of our proposed structure through ablation experiments.

1. INTRODUCTION

In many areas, collaborative Multi-Agent Reinforcement Learning (MARL) has broad application prospects, such as robots cluster control (Bus ¸oniu et al., 2010 ), multi-vehicle auto-driving (Bhalla et al., 2020) , and shop scheduling (Jiménez, 2012) . In a multi-agent environment, an agent should observe the environment's dynamics and understand the learning policies of other agents to form good collaborations. Real-world scenarios usually have a large number of agents with different identities or capabilities, which puts forward higher requirements for collaborations among agents. Therefore, how to solve the large-scale MARL problem and promote to form stable and complementary cooperation among agents with different identities and capabilities are particularly important. To solve the large-scale agents issue, we can find that many collaborative MARL works adopting the centralized training paradigm use the full static parameter sharing mechanism (Gupta et al., 2017) , which allows agents to share parameters of agents' policy networks, thus simplifying the algorithm structure and improving performance efficiency. This mechanism is effective because agents generally receive similar observation information in the existing narrow and simple multiagent environments. In our Google Research Football (GRF) (Kurach et al., 2020 ) experiments, we can find that blindly applying the full parameter sharing mechanism does not improve the performance of algorithms because the observation information is very different due to the movement of different players. At the same time, because the full static parameter sharing mechanism ignores the identities and abilities of different agents, it constantly limits the diversity of agents' behavior policies (Li et al., 2021; Yang et al., 2022) , which makes it difficult to promote complementarity and reliable cooperation between agents in complex scenarios.

