DPMAC: DIFFERENTIALLY PRIVATE COMMUNICA-TION FOR COOPERATIVE MULTI-AGENT REINFORCE-MENT LEARNING

Abstract

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the differentially private multi-agent communication (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous (ϵ, δ)-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.

1. INTRODUCTION

Multi-agent reinforcement learning (MARL) has shown remarkable achievements in many realworld applications such as sensor networks (Zhang & Lesser, 2011) , autonomous driving (Shalev-Shwartz et al., 2016b), and traffic control (Wei et al., 2019) . To mitigate non-stationarity when training the multi-agent system, centralized training and decentralized execution (CTDE) paradigm is proposed. The CTDE paradigm yet faces the hardness to enable complex cooperation and coordination for agents during execution due to the inherent partial observability in multi-agent scenarios (Wang et al., 2020b) . To make agents cooperate more efficiently in complex partial observable environments, communication between agents has been considered. Numerous works proposed differentiable communication methods between agents, which can be trained in an end-to-end manner, for more efficient cooperation among agents (Foerster et al., 2016; Jiang & Lu, 2018; Das et al., 2019; Ding et al., 2020; Kim et al., 2021; Wang et al., 2020b) . The communication can be either broadcast (Das et al., 2019; Jiang & Lu, 2018; Wang et al., 2020b) , where the connection between agents can be modeled as a complete graph, or one-to-one as a general graph (Ding et al., 2020). However, the advantages of communication, resulting from full information sharing, come with the possible privacy leakage of individual agents for both broadcasted and one-to-one messages. Therefore, in practice, one agent may be unwilling to fully share its private information with other agents even though in cooperative scenarios. For instance, if we train and deploy an MARL-based autonomous driving system, each autonomous vehicle involved in this system could be regarded as an agent and all vehicles work together to improve the safety and efficiency of the system. Hence, this can be regarded as a cooperative MARL scenario (Shalev-Shwartz et al., 2016a; Yang et al., 2020) . However, owners of autonomous vehicles may not allow their vehicles to send private information to other vehicles without any desensitization since this may divulge their private information such as their personal life routines (Hassan et al., 2020) . Hence, a natural question arises: Can the MARL algorithm with communication under the CTDE framework be endowed with both the rigorous privacy guarantee and the empirical efficiency?

