PETTINGZOO: GYM FOR MULTI-AGENT REINFORCE-MENT LEARNING

Abstract

This paper introduces PettingZoo, a library of diverse sets of multi-agent environments under a single elegant Python API. PettingZoo was developed with the goal of accelerating research in multi-agent reinforcement learning, by creating a set of benchmark environments easily accessible to all researchers and a standardized API for the field. This goal is inspired by what OpenAI's Gym library did for accelerating research in single-agent reinforcement learning, and PettingZoo draws heavily from Gym in terms of API and user experience. PettingZoo is unique from other multi-agent environment libraries in that it's API is based on the model of Agent Environment Cycle ("AEC") games, which allows for the sensible representation of all varieties of games under one API for the first time. While retaining a very simple and Gym-like API, PettingZoo still allows access to low-level environment properties required by non-traditional learning methods.

1. INTRODUCTION

Reinforcement Learning ("RL") considers learning a policy -a function that takes in an observation from an environment and emits an action -that achieves the maximum expected discounted reward when acting in an environment, and it's capabilities have been one of the great success of modern machine learning. Multi-Agent Reinforcement Learning (MARL) in particular has been behind many of the most publicized achievements of modern machine learning -AlphaGo Zero (Silver et al., 2017) , OpenAI Five (OpenAI, 2018) , AlphaStar (Vinyals et al., 2019) -and has seen a boom in recent years. However, popular benchmark environments are scattered across many different locations (or made from scratch), are based around heterogeneous APIs, and are often in unmaintained states. Because of this, highly influential research in the field is generally restricted to institutions with dedicated engineering teams, research into new methods generally aren't compared in like environments, and progress has been slow compared to single agent reinforcement learning (though this obviously cannot be attributed to benchmarks alone). Motivated by this, we introduce PettingZoo -a Python library collecting maintained versions of all popular MARL environments under a single simple Python API similar to that of OpenAI's Gym library. It's available on PyPI and can be installed via pip install pettingzoo.

2. A TALE OF TOO MANY LIBRARIES

OpenAI Gym (Brockman et al., 2016) was introduced shortly after the potential of reinforcement learning became widely known with Mnih et al. (2015) . At the time, doing basic research in reinforcement learning was a large engineering challenge. The most popular set of environments were Atari games as part of the Arcade Learning Environment ("ALE") (Bellemare et al., 2013) . The ALE originally was challenging to compile and install, and had an involved C API and later an unofficial fork with a Python wrapper (Goodrich, 2015) . A scattering of other environments existed as independent projects, in various languages, all with unique APIs. This level of heterogeneity meant that reinforcement learning code had to be adapted to every environment (including bridging programming languages). Accordingly, standardized reinforcement learning implementations weren't possible, comparisons against a wide variety of environments were very difficult, and doing simple research in reinforcement learning was generally restricted to organizations with software engineering divisions. Gym was created to promote research in reinforcement learning by making comprehensive benchmarking more accessible, by allowing algorithm reuse, and by letting average machine learning researchers access the environments. This last point was achieved by putting every environment that a researcher would want to benchmark with (at the time of creation) under one simple API that anyone could understand, in Python (which was just starting to be the lingua-de-franca for machine learning). This lead to a mass proliferation of reinforcement learning research (especially at smaller institutions), many environments compliant with the API (Kidziński et al., 2018; Leurent, 2018; Zamora et al., 2016) , and many RL libraries based around the API (Hill et al., 2018; Liang et al., 2018; Kuhnle et al., 2017) . In the multiagent space, a similar level of fragmentation currently exists. Notable heterogenous sets of environments include OpenAI's Competitive Multi-Agent Environments for competitive robotic control (Bansal et al., 2017) 

3. RELATED WORKS

Two attempts at some level of unification in the multi-agent space have been made. The first is Open-Spiel, released by Deepmind in 2019 (Lanctot et al., 2019) , which includes excellent implementations of 45 classic games under one sensible API. However, their framework is limited to supporting simple discrete games due to its modeling of games as trees (which is impractical to represent for more continuous environments such as Atari). However, in the space of discrete games it has managed to encourage high quality and fair evaluations of general game solving methods. The second is the multi-agent API of RLlib Liang et al. ( 2018), an ambitious distributed RL framework. While the API is powerful, it has very minimal feature support and cannot sensibly represent strictly turn based games (i.e. Hananbi or Go). However, we do maintain PettingZoo support within RLlib so that users can easily leverage the learning methods included on our environments.

Simplicity and Similarity to Gym

The ability for the Gym API to be near instantly understood has been a large driving factor in it's widespread adoption. While a multi-agent API will inherently add complexity, we wanted to create a similarly simple API, and one that would be instantly familiar to researchers who have worked with Gym.

Agent Environment Cycle Games Based API

Most environments have APIs that model agents as all stepping at once (Lowe et al., 2017; Zheng et al., 2017; Gupta et al., 2017; Liu et al., 2019; Liang et al., 2018) , based on the Partially Observable Stochastic Games (POSGs) model. It turns out this easily results in bugs (Terry et al., 2020b) and is undesirable for handling strictly turn-based games, like chess, since agents aren't allowed to step simultaneously. We instead model our API after the new Agent Environment Cycle games model, an equivalent model where agents step sequentially. That is, an agent performs an action, the environment responds, the next agent acts, the environment responds again, and the cycle repeats. This model allows for the sensible interactions with both strictly turn based games like chess and games where agents truly step simultaneously. A POSG can be easily converted to an equivalent sequential game by having each agent take a step in a cycle, and then updating the rewards and observations of the AEC game at the end of a cycle. For a formal proof of equivalence see (Terry et al., 2020b) .

Variable Number of Agents

We designed our API to robustly support the widest range of multi-agent scenarios possible, including agent generation and death. No general API currently supports this, but it is such an integral feature of so many environments that this support is essential.



, Sequential Social Dilemma Games for games where cooperation is difficult in a game theoretic sense (Leibo et al., 2017), RLCard for various card games Zha et al. (2019), MAgent for huge numbers of agents (Zheng et al., 2017), Multi-Particle Environments ("MPE") for diverse agent roles (Mordatch and Abbeel, 2017; Lowe et al., 2017), the Starcraft Multi-Agent Challenge (Samvelyan et al., 2019), and dozens more.

