GOBIGGER: A SCALABLE PLATFORM FOR COOPERATIVE-COMPETITIVE MULTI-AGENT REINFORCEMENT LEARNING

Abstract

The emergence of various multi-agent environments has motivated powerful algorithms to explore agents' cooperation or competition. Even though this has greatly promoted the development of multi-agent reinforcement learning (MARL), it is still not enough to support further exploration on the behavior of swarm intelligence between multiple teams, and cooperation between multiple agents due to their limited scalability. To alleviate this, we introduce GoBigger, a scalable platform for cooperative-competition multi-agent interactive simulation. GoBigger is an enhanced environment for the Agar-like game, enabling the simulation of multiple scales of agent intra-team cooperation and inter-team competition. Compared with existing multi-agent simulation environments, our platform supports multi-team games with more than two teams simultaneously, which dramatically expands the diversity of agent cooperation and competition, and can more effectively simulate the swarm intelligent agent behavior. Besides, in GoBigger, the cooperation between the agents in a team can lead to much higher performance. We offer a diverse set of challenging scenarios, built-in bots, and visualization tools for best practices in benchmarking. We evaluate several state-of-the-art algorithms on GoBigger and demonstrate the potential of the environment. We believe this platform can inspire various emerging research directions in MARL, swarm intelligence, and large-scale agent interactive learning. Both GoBigger and its related benchmark are open-sourced.

1. INTRODUCTION

The swarm behavior of multi-agent systems (MAS) widely exists in nature and human society. In MAS, individual agent pursues their goal and interacts with each other in local areas, following the rules of cooperation or competition, and then the intelligent behavior of the agent group forms the complex collective behaviors. The phenomena of collective behaviors can be found in the flocking birds (Bhattacharya & Vicsek, 2010) , molecular motors (Chowdhury, 2006) , human crowds (Helbing et al., 2000) , and traffic systems (Kanagaraj & Treiber, 2018) . To understand and simulate such phenomena, some rule-based models (Castellano et al., 2009) can simulate the swarm behavior in an unconstrained environment with random movement. However, in a complex interactive environment such as intra-cellular molecular motor transport, where the interaction of agents is time-varying and updatable, it is challenging to recover the underlying collective behaviors by manually designing the controllers or rules. Interactive simulation of multi-agent systems can provide significant convenience for multi-agent learning algorithms. Some existing multi-agent simulation environments mainly focus on the coop- Agents in these environments are divided into different teams to achieve intra-team cooperation and inter-team competition. However, most of them consist of up to two teams, named 2 × N mode. That means they can not handle situations where multiple teams cooperate or compete with each other, which is necessary for the research on the swarm behavior of multi-agent systems. Besides, the performance gap caused by different levels of cooperation is not significant in most environments including SMAC, Google Research Football, and NeuralMMO. For the most popular multi-agent environment SMAC, recently SMACv2 (Ellis, 2022) shows that agents can get high performance after dropping teammates' observations. The author uses only the identity and observation of agents in the training phase under QMIX (Rashid et al., 2018) and MAPPO (Yu et al., 2021) to achieve the same performance as that using the global state additionally. And the replay videos of these environments with well-trained agents could not show the complex cooperation. For those environments that emphasize cooperation through game mechanics, the observations of them are too simple that different multi-agent algorithms can easily reach high performance. More details about the various multi-agent simulation environments are shown in Table 1 . In this paper, we propose a scalable platform, GoBigger, aiming to delve into cooperative-competitive multi-agent reinforcement learning for swarm intelligence between multiple teams. Different from the previous multi-agent simulation environment, GoBigger is a scalable environment that enables the simulation of various teams and agents in each team. In other words, in the M × N game mode of GoBigger, M means the number of teams in the environment, and N means the number of players in each team. This new game mode dramatically expands the way of agent cooperation and competition and can more effectively simulate the swarm intelligent agent behavior. In addition, GoBigger makes agents with intra-team cooperation and inter-team competition achieve higher performance according to the restriction of game mechanics and rules, which is approved in Section 6.1. We offer a diverse set of challenge scenarios in GoBigger for best practices in benchmarking. In most of the given scenarios, each player in a team is controlled by an independent agent that has to act based on only its local observation or all teammates' observations. Meanwhile, GoBigger is a more complex



Figure 1: New users can follow the given research workflow, while advanced users can customize the configuration of the environment and define new tasks based on GoBigger. The lower left part shows the basic units (balls) and the related actions. The lower right part shows the training in the league with many games going on at the same time, from where cooperative and competitive behaviors of agents can be observed.

availability

More information could be found at https://github.com/opendilab/GoBigger.

