ROBUST MULTI-AGENT REINFORCEMENT LEARNING DRIVEN BY CORRELATED EQUILIBRIUM

Abstract

In this paper we deal with robust cooperative multi-agent reinforcement learning (CMARL). While CMARL has many potential applications, only a trained policy that is robust enough can be confidently deployed in real world. Existing works on robust MARL mainly apply vanilla adversarial training in centralized training and decentralized execution paradigm. We, however, find that if a CMARL environment contains an adversarial agent, the performance of decentralized equilibrium might perform significantly poor for achieving such adversarial robustness. To tackle this issue, we suggest that when execution the non-adversarial agents must jointly make the decision to improve the robustness, therefore solving correlated equilibrium instead. We theoretically demonstrate the superiority of correlated equilibrium over the decentralized one in adversarial MARL settings. Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution. The global variables with mutual information are proposed to help agents learn robust policies with MARL algorithms. The experimental results show that our method can dramatically boost performance on the SMAC environments.

1. INTRODUCTION

Recently, reinforcement learning (RL) has achieved remarkable success in many practical sequential decision problems, such as Go (Silver et al., 2017) , chess (Silver et al., 2018) , real-time strategy games (Vinyals et al., 2019) , etc. In real-world, many sequential decision problems involve more than one decision maker (i.e. multi-agent), such as auto-driving, traffic light control and network routing. Cooperative multi-agent reinforcement learning (CMARL) is a key framework to solve these practical problems. Existing MARL methods for cooperative environments include policybased methods, e.g. MADDPG (Lowe et al., 2017 ), COMA (Foerster et al., 2017) , and value-based methods, e.g. VDN (Sunehag et al., 2018 ), QMIX (Rashid et al., 2018) , QTRAN (Son et al., 2019) . However, before we actually apply CMARL's policy into real world applications, a question must be asked: are these learned policies safe or robust to be deployed? What will happen if some agents made mistakes or behaved adversarially against other agents? It will be most likely that the entire team might fail to achieve their goal or perform extremely poorly. (Lin et al., 2020) demonstrates the unrobustness in CMARL environment, where a learnt adversarial of one agent can hugely decrease the team's performance. Therefore, in practice, we expect to have a multi-agent team policy in a fully cooperative environment that is robust when some agent(s) make some mistakes and even behave adversarially. To the best of knowledge, very few existing works on this issue mainly use vanilla adversarial training strategy. Klima et al. ( 2018) considered a two-agent cooperative case, in order to make the policy robust, agents become competitive with a certain probability during training. Li et al. (2019) provided a robust MADDPG approach called M3DDPG, where each agent optimizes its policy based on other agents' influenced sub-optimal actions. Most state-of-the-art MARL algorithms utilize centralized training and decentralized execution (CTDE) routine, since this setting is common in real world cases. The robust MARL method M3DDPG also followed the CTDE setting. However, existing works on team mini-max normal form or extensive form games show that if the environment contains an adversarial agent, then the 1

