MARLLIB: EXTENDING RLLIB FOR MULTI-AGENT REINFORCEMENT LEARNING

Abstract

Despite the fast development of multi-agent reinforcement learning (MARL) methods, there is a lack of commonly-acknowledged baseline implementation and evaluation platforms. As a result, an urgent need for MARL researchers is to develop an integrated library suite, similar to the role of RLlib in single-agent RL, that delivers reliable MARL implementation and replicable evaluation in various bechmarks. To fill such a research gap, in this paper, we propose Multi-Agent RLlib (MARLlib), a comprehensive MARL algorithm library that facilitates RLlib for solving multi-agent problems. With a novel design of agent-level distributed dataflow, MARLlib manages to unify tens of algorithms, including different types of independent learning, centralized critic, and value decomposition methods; this leads to a highly composable integration of MARL algorithms that are not possible to unify before. Furthermore, MARLlib goes beyond current work by integrating diverse environment interfaces and providing flexible parameter sharing strategies; this allows to create versatile solutions to cooperative, competitive, and mixed tasks with minimal code modifications for end users. A plethora of experiments are conducted to substantiate the correctness of our implementation, based on which we further derive new insights on the relationship between the performance and the design of algorithmic components. With MARLlib, we expect researchers to be able to tackle broader real-world multi-agent problems with trustworthy solutions. Our code 1 and documentation 2 are released for reference.

1. INTRODUCTION

Multi-Agent Reinforcement Learning (MARL) is a prosperous research field that has many realworld applications and holds revolutionary potential for advanced collective intelligence [6, 38, 36] . Existing work [2, 33, 5] has shown that agents are able to learn strategies that could outperform human experts and help guide human's decision-making process in reverse. Significant as these outcomes are, the algorithm implementations are always task-specific, making it hard to compare algorithm performances, observe algorithm robustness across tasks, or use them off the shelf. Thus, developing a commonly-acknowledged baseline implementation and a unified tool suite for MARL research is in urgent demand. While single-agent RL has witnessed successful unification for both algorithms (e.g. SpinningUp [1], Tianshou [35] , RLlib [19] , Dopamine [7] and Stable-Baselines series [10, 12, 25] ) and environments (e.g. Gym [4]), multi-agent RL has unique challenges in building a comprehensive and high-quality library. Firstly, there exist diverse MARL algorithm pipelines. MARL algorithms diverge in learning targets such as working as a group and learning to cooperate, or competing with other agents and finding a strategy that can maximize individual reward while minimizing others. Algorithms also have different restrictions on agent parameters sharing strategies, with HATRPO agents forced to not share parameters and MAPPO capitalizing on sharing. Different styles of central information utilization such as mixing value functions (e.g. VDN [30] ) or centralizing value function (e.g. MADDPG [20] ) introduce extra challenge on algorithm learning style unification. Existing libraries such as EPyMARL [23] attempt to unify MARL algorithms under one framework by introducing independent learning, centralized critic, and value decomposition categorization but still lack the effort to address



https://github.com/ICLR2023Paper4242/MARLlib https://iclr2023marllib.readthedocs.io/

annex

all the problems above. The diversity of MARL algorithms is still a huge challenge for unification. Secondly, various multi-agent environment interfaces are mutually inconsistent, as they are originally designed to fit the task nature (e.g. asynchronous interaction is used in Hanabi, action masks are provided as additional information in SMAC [28] , local observation and global state are mixed in MAgent [39] ). The inconsistency hinders a directly unified agent-environment interaction processing and results in the issue of coupling between algorithm implementation and task environment; an algorithm implementation for one environment can not be directly applied to another due to interface changes. While PettingZoo [32] builds a collection of diverse multi-agent tasks, it is inconvenient for CTDE-based algorithm implementation as important information such as global state and action mask is not explicitly provided. Towards the inconsistency problem, other work, such as MAPPO benchmark [37] , provides each environment with a unique runner script. Nevertheless, this solution creates hurdles for long-term maintenance as well as uneasiness for new task extensions.To address the above challenges in one work, we build a new library called MARLlib based on Ray [22] and RLlib. By inheriting core advantages from RLlib and providing the following four novel features, MARLlib serves as a comprehensive platform for MARL research community.1. Unified algorithm pipeline with a newly proposed agent-level distributed dataflow: To unify algorithms under diverse MARL topics and enable them to share the same learning pipeline while preserving their unique optimization logics, we construct MARLlib under the guidance of a key observation: all multi-agent learning paradigms can be equivalently transformed to the combination of single-agent learning processes; thus each agent maintains its own dataflow and optimizes the policy regardless of other agents. With this philosophy, algorithms are implemented in a unified pipeline to tackle various types of tasks, including cooperative (team-reward-only cooperation), collaborative (individual-reward-accessible cooperation), competitive (individual competition), and mixed (teamwork-based competition) tasks. We further categorize algorithms based on how they utilize central information, thereby enabling module sharing and extensibility. As shown in Figure 1 , MARLlib manages to unify tens of algorithms with the proposed agent-level distributed dataflow, validating its effectiveness.2. Unified multi-agent environment interface: In order to fully decouple algorithms from environments, we propose a new interface following Gym standard, with a data structure design that is compatible with most of the existing multi-agent environments, supports asynchronous agentenvironment interaction, and provides necessary information to algorithms. [3] ) picked from the zoo of multi-agent tasks because of their inter-diversity, covering various task

