DPMAC: DIFFERENTIALLY PRIVATE COMMUNICA-TION FOR COOPERATIVE MULTI-AGENT REINFORCE-MENT LEARNING

Abstract

Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. We propose the differentially private multi-agent communication (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous (ϵ, δ)-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.

1. INTRODUCTION

Multi-agent reinforcement learning (MARL) has shown remarkable achievements in many realworld applications such as sensor networks (Zhang & Lesser, 2011) , autonomous driving (Shalev-Shwartz et al., 2016b) , and traffic control (Wei et al., 2019) . To mitigate non-stationarity when training the multi-agent system, centralized training and decentralized execution (CTDE) paradigm is proposed. The CTDE paradigm yet faces the hardness to enable complex cooperation and coordination for agents during execution due to the inherent partial observability in multi-agent scenarios (Wang et al., 2020b) . To make agents cooperate more efficiently in complex partial observable environments, communication between agents has been considered. Numerous works proposed differentiable communication methods between agents, which can be trained in an end-to-end manner, for more efficient cooperation among agents (Foerster et al., 2016; Jiang & Lu, 2018; Das et al., 2019; Ding et al., 2020; Kim et al., 2021; Wang et al., 2020b) . The communication can be either broadcast (Das et al., 2019; Jiang & Lu, 2018; Wang et al., 2020b) , where the connection between agents can be modeled as a complete graph, or one-to-one as a general graph (Ding et al., 2020) . However, the advantages of communication, resulting from full information sharing, come with the possible privacy leakage of individual agents for both broadcasted and one-to-one messages. Therefore, in practice, one agent may be unwilling to fully share its private information with other agents even though in cooperative scenarios. For instance, if we train and deploy an MARL-based autonomous driving system, each autonomous vehicle involved in this system could be regarded as an agent and all vehicles work together to improve the safety and efficiency of the system. Hence, this can be regarded as a cooperative MARL scenario (Shalev-Shwartz et al., 2016a; Yang et al., 2020) . However, owners of autonomous vehicles may not allow their vehicles to send private information to other vehicles without any desensitization since this may divulge their private information such as their personal life routines (Hassan et al., 2020) . Hence, a natural question arises: Can the MARL algorithm with communication under the CTDE framework be endowed with both the rigorous privacy guarantee and the empirical efficiency? To answer this question, we start with a simple motivating example called single round binary sums, where several players attempt to guess the bits possessed by others and they can share their own information by communication. In Section 4, we show that a local message sender using the randomized response mechanism allows an analytical receiver to correctly calculate the binary sum in a privacy-preserving way. From the example we gain two insights: 1) The information is not supposed to be aggregated likewise in previous communication methods in MARL (Das et al., 2019; Ding et al., 2020) , as a trusted data curator is not available in general. On the contrary, privacy is supposed to be achieved locally for every agent; 2) Once the agents know a priori, that certain privacy constraint exists, they could adjust their inference on the noised message. These two insights indicate the principles of our privacy-preserving communication structure that we desire a privacy-preserving local sender and a privacy-aware analytical receiver. Our algorithm, differentially private multi-agent communication (DPMAC), instantiates the described principles. More specifically, for the sender part, each agent is equipped with a local sender which ensures differential privacy (DP) (Dwork, 2006) by performing an additive Gaussian noise. The message sender in DPMAC is local in the sense that each agent is equipped with its own message sender, which is only used to send its own messages. Equipped with this local sender, DPMAC is able to not only protect the privacy of communications between agents but also satisfy different privacy levels required from different agents. In addition, the sender adopt the Gaussian distribution to represent the message space and sample the stochastic message from the learned distribution. However, it is known that the DP noise may impede the original learning process (Dwork et al., 2014; Alvim et al., 2011) , resulting in unstable or even divergent algorithms, especially for deep-learningbased methods (Abadi et al., 2016; Chen et al., 2020) . To cope with this issue, we incorporate the noise variance into the representation of the message distribution, so that the agents could learn to adjust the message distribution automatically according to varying noise scales. For the receiver part, because of the gradient chain between the sender and the receiver, our receiver naturally utilizes the privacy-relevant information hidden in the gradients. This implements the privacy-aware analytical receiver described in the motivating example. When protecting the privacy in communication is required in a cooperative game, the game is not purely cooperative anymore since each player involved will face a trade-off between the team utility and its personal privacy. To analyze the convergence of cooperative games with privacy-preserving communication, we first define a single-step game, namely the collaborative game with privacy (CGP). We prove that under some mild assumptions of the players' value functions, CGP could be transformed into a potential game (Monderer & Shapley, 1996) , subsequently leading to the existence of a Nash equilibrium (NE). With this property, NE could also be proved to exist in the single round binary sums game. Furthermore, we extend the single round binary sums into a multistep game called multiple round sums using the notion of Markov potential game (MPG) (Leonardos et al., 2021) . Inspired by Macua et al. (2018) and modeling the privacy-preserving communication as part of the agent action, we prove the existence of NE, which indicates that the multi-step game with privacy-preserving communication could be learnable. To validate the effectiveness of DPMAC, extensive experiments are conducted in multi-agent particle environment (MPE) (Lowe et al., 2017) , including cooperative navigation, cooperative communication and navigation, and predator-prey. Specifically, in privacy-preserving scenarios, DPMAC significantly outperforms baselines. Moreover, even without any privacy constraints, DPMAC could gain competitive performance against baselines. To sum up, the contributions of this work are threefold: • To the best of our knowledge, we make the first attempt to develop a framework for private communication in MARL, named DPMAC, with the theoretical guarantee of (ϵ, δ)-DP. • We prove the existence of the Nash equilibrium for the cooperative games with privacy-preserving communication, which shows that these games are learnable.

