CAMA: A NEW FRAMEWORK FOR SAFE MULTI-AGENT REINFORCEMENT LEARNING USING CON-STRAINT AUGMENTATION

Abstract

With the widespread application of multi-agent reinforcement learning (MARL) in real-life settings, the ability to meet safety constraints has become an urgent problem to solve. For example, it is necessary to avoid collisions to reach a common goal in controlling multiple drones. We address this problem by introducing the Constraint Augmented Multi-Agent framework -CAMA. CAMA can serve as a plug-and-play module to the popular MARL algorithms, including centralized training, decentralized execution and independent learning frameworks. In our approach, we represent the safety constraint as the sum of discounted safety costs bounded by the predefined value, which we call the safety budget. Experiments demonstrate that CAMA can converge quickly to a high degree of constraint satisfaction and surpasses other state-of-the-art safety counterpart algorithms in both cooperative and competitive settings.

1. INTRODUCTION

Multi-agent problems are ubiquitous in real world, such as robotics (Al-Abbasi et al., 2019; Mguni et al., 2021) , transportation systems (Zhou et al., 2020; Chu et al., 2019) , network optimization (Wang et al., 2020; Wai et al., 2018) , and multi-player video games (Du et al., 2019; Samvelyan et al., 2019; Han et al., 2019; Peng et al., 2017) . A modern approach to solving these decisionmaking problems is multi-agent reinforcement learning (MARL), which tackles these problems using only interactions with the environment. There are many different frameworks within MARL such as fully centralized Berner et al. (2019); Sukhbaatar & Fergus (2016) , independent learning (IL) de Witt et al. (2020); Zhang et al. (2018) and a hybrid framework which is the centralized training and decentralized execution (CTDE) (Foerster et al., 2018; Lowe et al., 2017; Yang et al., 2018) . However, within the deployment of MARL, safety is still a crucial problem, which has not been fully solved yet. In recent years, several works have incorporated safety constraints into RL training, such as optimizing policy under constraints (Di Castro et al., 2012; Tessler et al., 2018; Achiam et al., 2017; Chow et al., 2018) , adding safety layers (Dalal et al., 2018) , or constructing verifiable safe exploration (Anderson et al., 2020), etc. In the context of safe MARL, recent papers extend constrained policy optimization (Achiam et al., 2017) to multi-agent domain (Gu et al., 2021) as a model-free safe MARL algorithms. But there are still challenges with low reward performance compare to the non-safe MARL algorithms. There are also some works performed constrained policy optimization by transforming it into a min-max game (Lu et al., 2021; Liu et al., 2021) . However, which limited by the specific framework, it cannot generalize to other framework such as solving the competitive game. Therefore, a more general safe MARL framework with high reward performance is still lacking at this stage. To fill this gap, in this paper we propose a general module that can be incorporated into different MARL algorithms. The proposed Constraint Augmented Multi-Agent framework, coined as CAMA, is a plug-and-play method to improve cutting-edge non-safe MARL algorithms satisfying the adding constraints. Furthermore, CAMA aims to address both cooperation and competitive setting under CTDE and IL frameworks. In our algorithm, we represent the safety constraint as the sum of discounted safety costs bounded by a pre-defined scalar, which we call the safety budget.

