LEARNING SAFE MULTI-AGENT CONTROL WITH DECENTRALIZED NEURAL BARRIER CERTIFICATES

Abstract

We study the multi-agent safe control problem where agents should avoid collisions to static obstacles and collisions with each other while reaching their goals. Our core idea is to learn the multi-agent control policy jointly with learning the control barrier functions as safety certificates. We propose a new joint-learning framework that can be implemented in a decentralized fashion, which can adapt to an arbitrarily large number of agents. Building upon this framework, we further improve the scalability by incorporating neural network architectures that are invariant to the quantity and permutation of neighboring agents. In addition, we propose a new spontaneous policy refinement method to further enforce the certificate condition during testing. We provide extensive experiments to demonstrate that our method significantly outperforms other leading multi-agent control approaches in terms of maintaining safety and completing original tasks. Our approach also shows substantial generalization capability in that the control policy can be trained with 8 agents in one scenario, while being used on other scenarios with up to 1024 agents in complex multi-agent environments and dynamics. Videos and source code can be found on the website 1 .

1. INTRODUCTION

Machine learning (ML) has created unprecedented opportunities for achieving full autonomy. However, learning-based methods in autonomous systems (AS) can and do fail due to the lack of formal guarantees and limited generalization capability, which poses significant challenges for developing safety-critical AS, especially large-scale multi-agent AS, that are provably dependable. On the other side, safety certificates (Chang et al. (2019) ; Jin et al. (2020) ; Choi et al. (2020) ), which widely exist in control theory and formal methods, serve as proofs for the satisfaction of the desired properties of a system, under certain control policies. For example, once found, a Control Barrier Function (CBF) ensures that the closed-loop system always stays inside some safe set (Wieland & Allgöwer, 2007; Ames et al., 2014) with a CBF Quadratic Programming (QP) supervisory controller. However, it is extremely difficult to synthesize CBF by hand for complex dynamic systems, which stems a growing interest in learning-based CBF (Saveriano & Lee, 2020; Srinivasan et al., 2020; Jin et al., 2020; Boffi et al., 2020; Taylor et al., 2020; Robey et al., 2020) . However, all of these studies only concern single-agent systems. How to develop learning-based approaches for safe multi-agent control that are both provably dependable and scalable remains open. In multi-agent control, there is a constant dilemma: centralized control strategies can hardly scale to a large number of agents, while decentralized control without coordination often misses safety and performance guarantees. In this work, we propose a novel learning framework that jointly designs multi-agent control policies and safety certificate from data, which can be implemented in a decentralized fashion and scalable to an arbitrary number of agents. Specifically, we first introduce the notion of decentralized CBF as safety certificates, then propose the framework of learning decentralized CBF, with generalization error guarantees. The decentralized CBF can be seen as a contract among agents, which allows agents to learn a mutual agreement with each other on how to avoid collisions. Once such a controller is achieved through the joint-learning framework, it can be applied on an arbitrarily number of agents and in scenarios that are different from the training scenarios, which resolves the fundamental scalability issue in multi-agent control. We also propose several effective techniques in Section 4 to make such a learning process even more scalable and practical, which are then validated extensively in Section 5. Experimental results are indeed promising. We study both 2D and 3D safe multi-agent control problems, each with several distinct environments and complex nonholonomic dynamics. Our jointlearning framework performs exceptionally well: our control policies trained on scenarios with 8 agents can be used on up to 1024 agents while maintaining low collision rates, which has notably pushed the boundary of learning-based safe multi-agent control. Speaking of which, 1024 is not the limit of our approach but rather due to the limited computational capability of our laptop used for the experiments. We also compare our approach with both leading learning-based methods (Lowe et al., 2017; Zhang & Bastani, 2019; Liu et al., 2020) and traditional planning methods (Ma et al., 2019; Fan et al., 2020) . Our approach outperforms all the other approaches in terms of both completing the tasks and maintaining safety. Contributions. Our main contributions are three-fold: 1) We propose the first framework to jointly learning safe multi-agent control policies and CBF certificates, in a decentralized fashion. 2) We present several techniques that make the learning framework more effective and scalable for practical multi-agent systems, including the use of quantity-permutation invariant neural network architectures in learning to handle the permutation of neighbouring agents. 3) We demonstrate via extensive experiments that our method significantly outperforms other leading methods, and has exceptional generalization capability to unseen scenarios and an arbitrary number of agents, even in quite complex multi-agent environments such as ground robots and drones. The video that demonstrates the outstanding performance of our method can be found in the supplementary material. Related Work. Learning-Based Safe Control via CBF. Barrier certificates (Prajna et al., 2007) and CBF (Wieland & Allgöwer, 2007 ) is a well-known effective tool for guaranteeing the safety of nonlinear dynamic systems. However, the existing methods for constructing CBFs either rely on specific problem structures (Chen et al., 2017b) or do not scale well (Mitchell et al., 2005) . Recently, there has been an increasing interest in learning-based and data-driven safe control via CBFs, which primarily consist of two categories: learning CBFs from data (Saveriano & Lee, 2020; Srinivasan et al., 2020; Jin et al., 2020; Boffi et al., 2020) , and CBF-based approach for controlling unknown systems (Wang et al., 2017; 2018; Cheng et al., 2019; Taylor et al., 2020) . Our work is more pertinent to the former and is complementary to the latter, which usually assumes that the CBF is provided. None of these learning-enabled approaches, however, has addressed the multi-agent setting. Multi-Agent Safety Certificates and Collision Avoidance. Restricted to holonomic systems, guaranteeing safety in multi-agent systems has been approached by limiting the velocities of the agents (Van den Berg et al., 2008; Alonso-Mora et al., 2013 ). Later, Borrmann et al. (2015 ) Wang et al. (2017) have proposed the framework of multi-agent CBF to generate collision-free controllers, with either perfectly known system dynamics (Borrmann et al., 2015) , or with worst-case uncertainty bounds (Wang et al., 2017) . Recently, Chen et al. ( 2020) has proposed a decentralized controller synthesized approach under this CBF framework, which is scalable to an arbitrary number of agents. However, in Chen et al. ( 2020) the CBF controller relies on online integration of the dynamics under the backup strategy, which can be computationally challenging for complex systems. Due to space limit, we omit other non-learning multi-agent control methods but acknowledge their importance. Safe Multi-Agent (Reinforcement) Learning (MARL). Safety concerns have drawn increasing attention in MARL, especially with the applications to safety-critical multi-agent systems (Zhang & Bastani, 2019; Qie et al., 2019; Shalev-Shwartz et al., 2016) . Under the CBF framework, Cheng et al. (2020) considered the setting with unknown system dynamics, and proposed to design robust multiagent CBFs based on the learned dynamics. This mirrors the second category mentioned above in single-agent learning-based safe control, which is perpendicular to our focus. RL approaches have also been applied for multi-agent collision avoidance (Chen et al., 2017a; Lowe et al., 2017; Everett et al., 2018; Zhang et al., 2018) . Nonetheless, no formal guarantees of safety were established in these works. One exception is Zhang & Bastani (2019) , which proposed a multi-agent model predictive shielding algorithm that provably guarantees safety for any policy learned from MARL, which differs from our multi-agent CBF-based approach. More importantly, none of these MARL-



https://realm.mit.edu/blog/learning-safe-multi-agent-control-decentralized-neural-barrier-certificates

