EMERGENT COLLECTIVE INTELLIGENCE FROM MASSIVE-AGENT COOPERATION AND COMPETITION Anonymous authors Paper under double-blind review

Abstract

Inspired by organisms evolving through cooperation and competition between different populations on Earth, we study the emergence of artificial collective intelligence through massive-agent reinforcement learning. To this end, We propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. In Lux, we build our agents through the standard reinforcement learning algorithm in curriculum learning phases and leverage centralized control via a pixel-to-pixel policy network. As agents co-evolve through self-play, we observe several stages of intelligence, from the acquisition of atomic skills to the development of group strategies. Since these learned group strategies arise from individual decisions without an explicit coordination mechanism, we claim that artificial collective intelligence emerges from massive-agent cooperation and competition. We further analyze the emergence of various learned strategies through metrics and ablation studies, aiming to provide insights for reinforcement learning implementations in massive-agent environments.

1. INTRODUCTION

Complex group and social behaviors widely exist in humans and animals on Earth. In a vast ecosystem, the simultaneous cooperation and competition between populations and the changing environment serve as a natural driving force for the co-evolution of massive numbers of organisms (Wolpert & Tumer, 1999; Dawkins & Krebs, 1979) . This large-scale co-evolution between populations has enabled group strategies for tasks individuals cannot accomplish (Ha & Tang, 2022) . Inspired by this self-organizing mechanism in nature, i.e., collective intelligence emerges from massive-agent cooperation and competition, we propose to simulate the emergence of collective intelligence through training reinforcement learning agents in a massive-agent environment. We hope this can become a stepping stone toward massive-agent reinforcement learning research and an inspiring method for complex massive-agent problems. Recent progress in multi-agent reinforcement learning (MARL) demonstrates its potential to complete complex tasks through multi-agent cooperation, such as playing StarCraft2 (Vinyals et al., 2019) and DOTA2 (Berner et al., 2019) . However, the number of agents is still limited to dozens in those scenarios, far away from natural populations. To support large-scale multi-agent cooperation and competition, we reintroduce the massive-agent setting into multi-agent reinforcement learning. To this end, we propose Lux, a cooperative and competitive environment where hundreds of agents in two populations scramble for limited resources and fight off the darkness. We believe Lux is a suitable testbench for experimenting with collective intelligence because it provides an open environment for hundreds of agents to cooperate, compete and evolve. From the algorithmic perspective, the massive-agent setting poses great difficulties to reinforcement learning algorithms since the credit assignment problem becomes increasingly challenging. Some research (Lowe et al., 2017) focuses on the credit assignment problem between multi-agents, however, it lacks the scalability to massive-agent scenarios. To overcome that, we present a centralized control solution for Lux using a pixel-to-pixel modeling architecture (Han et al., 2019) coupled with Proximal Policy Optimization (PPO) (Schulman et al., 2017) algorithm. Using that solution, we avoid the problem of credit assignment, with up to a 90% win rate versus the state-of-the-art policy (Isaiah et al., 2021) proposed by the Toad Brigade team (TB) which won first place in the Lux AI competition on Kagglefoot_0 . Through self-play and curriculum learning phases, we observe several stages of the massive-agent co-evolution, from atomic skills such as moving and building to group strategies such as efficient territory occupation and long-term resource management. Note that group strategies arise from individual decisions without any explicit coordination mechanism or hierarchy, demonstrating how collective intelligence arises with co-evolution. Through quantitative analyses, further evidence shows that collective intelligence can emerge from massive-agent cooperation and competition, leading to behaviors beyond our expectations. For example, agents learn to stand in a diagonal row and move as a whole to segment off parts of the map as shown in Figure 1 . Without any prior knowledge, this efficient strategy emerges from spontaneous exploration. Furthermore, we perform a detailed ablation study to illustrate some implementation techniques which may be helpful in massive-agent reinforcement learning. (a) Blue is our policy and Yellow is TB. (b) Yellow is our policy and Blue is TB. Figure 1 : Two episodes between our policy and TB where our Workers stand in a diagonal row. Our agents discover it as an efficient way to expand the territory and limit the enemy's movement. Our main contributions are 1) we reintroduce massive-agent reinforcement learning as a scenario for studying collective intelligence and propose a new environment, Lux, as a starting point. 2) we provide evidence that collective intelligence emerges from co-evolution through massive agents' cooperation and competition in Lux. 3) we discuss the implementation details of our solution, which may provide valuable insights into massive-agent reinforcement learning.

2. RELATED WORK

Multi-Agent Environments. 



https://www.kaggle.com/c/lux-ai-2021/



Many environments such as Multi-agent Particle Environment (MPE)(Lowe et al., 2017)  and Google Research Football(Kurach et al., 2020)  are proposed to study multiagent cooperation and competition. For multi-agent cooperation, StarCraft Multi-Agent Challenge (SMAC)(Samvelyan et al., 2019)  provides a common testbench. However, SMAC focuses on decentralized micromanagement scenarios with only approximately 30 agents in play. In massiveagent environments, NeuralMMO (Suarez et al., 2021) is an open-ended Massively Multi-player Online (MMO) game environment with up to 1024 agents.MAgent (Zheng et al., 2018) is a grid world environment that supports up to a million agents. We propose Lux, a massive-agent reinforcement learning environment, which can support thousands of agents simultaneously acting at one step. Unlike previous massive-agent environments, Lux incorporates Real-Time-Strategy (RTS) game dynamics that are similar to Battlecode (2022) and MiniRTS(Tian et al., 2017). Moreover, Lux scales up the number of agents with frequent spawns and deaths, which opens up the potential for complex strategies in such a large-scale and highly dynamic scenario. Sha, 2018)) decompose global value into individual values using a linear model or neural network, which can be viewed as an implicit way of credit assignment. Another way of doing this is computing an agent-specific advantage function. For example, Foerster et al. (2017) uses counterfactual regret to measure contributions to the team. In complex games, Berner et al. (2019) and Ye et al. (2020) use hand-crafted team-based rewards for each agent as an explicit method of credit assignment. Compared to the implicit value decomposition method, this

