MARLLIB: EXTENDING RLLIB FOR MULTI-AGENT REINFORCEMENT LEARNING

Abstract

Despite the fast development of multi-agent reinforcement learning (MARL) methods, there is a lack of commonly-acknowledged baseline implementation and evaluation platforms. As a result, an urgent need for MARL researchers is to develop an integrated library suite, similar to the role of RLlib in single-agent RL, that delivers reliable MARL implementation and replicable evaluation in various bechmarks. To fill such a research gap, in this paper, we propose Multi-Agent RLlib (MARLlib), a comprehensive MARL algorithm library that facilitates RLlib for solving multi-agent problems. With a novel design of agent-level distributed dataflow, MARLlib manages to unify tens of algorithms, including different types of independent learning, centralized critic, and value decomposition methods; this leads to a highly composable integration of MARL algorithms that are not possible to unify before. Furthermore, MARLlib goes beyond current work by integrating diverse environment interfaces and providing flexible parameter sharing strategies; this allows to create versatile solutions to cooperative, competitive, and mixed tasks with minimal code modifications for end users. A plethora of experiments are conducted to substantiate the correctness of our implementation, based on which we further derive new insights on the relationship between the performance and the design of algorithmic components. With MARLlib, we expect researchers to be able to tackle broader real-world multi-agent problems with trustworthy solutions. Our code 1 and documentation 2 are released for reference.

1. INTRODUCTION

Multi-Agent Reinforcement Learning (MARL) is a prosperous research field that has many realworld applications and holds revolutionary potential for advanced collective intelligence [6, 38, 36] . Existing work [2, 33, 5] has shown that agents are able to learn strategies that could outperform human experts and help guide human's decision-making process in reverse. Significant as these outcomes are, the algorithm implementations are always task-specific, making it hard to compare algorithm performances, observe algorithm robustness across tasks, or use them off the shelf. Thus, developing a commonly-acknowledged baseline implementation and a unified tool suite for MARL research is in urgent demand. While single-agent RL has witnessed successful unification for both algorithms (e.g. SpinningUp [1], Tianshou [35] , RLlib [19] , Dopamine [7] and Stable-Baselines series [10, 12, 25] ) and environments (e.g. Gym [4]), multi-agent RL has unique challenges in building a comprehensive and high-quality library. Firstly, there exist diverse MARL algorithm pipelines. MARL algorithms diverge in learning targets such as working as a group and learning to cooperate, or competing with other agents and finding a strategy that can maximize individual reward while minimizing others. Algorithms also have different restrictions on agent parameters sharing strategies, with HATRPO agents forced to not share parameters and MAPPO capitalizing on sharing. Different styles of central information utilization such as mixing value functions (e.g. VDN [30] ) or centralizing value function (e.g. MADDPG [20] ) introduce extra challenge on algorithm learning style unification. Existing libraries such as EPyMARL [23] attempt to unify MARL algorithms under one framework by introducing independent learning, centralized critic, and value decomposition categorization but still lack the effort to address



https://github.com/ICLR2023Paper4242/MARLlib https://iclr2023marllib.readthedocs.io/ 1

