EVALUATING ROBUSTNESS OF COOPERATIVE MARL: A MODEL-BASED APPROACH Anonymous authors Paper under double-blind review

Abstract

In recent years, a proliferation of methods were developed for cooperative multiagent reinforcement learning (c-MARL). However, the robustness of c-MARL agents against adversarial attacks has been rarely explored. In this paper, we propose to evaluate the robustness of c-MARL agents via a model-based approach, named c-MBA. Our proposed formulation can craft much stronger adversarial state perturbations of c-MARL agents to lower total team rewards than existing model-free approaches. In addition, we propose the first victim-agent selection strategy and the first data-driven approach to define targeted failure states where each of them allows us to develop even stronger adversarial attack without the expert knowledge to the underlying environment. Our numerical experiments on two representative MARL benchmarks illustrate the advantage of our approach over other baselines: our model-based attack consistently outperforms other baselines in all tested environments.

1. INTRODUCTION

Deep neural networks are known to be vulnerable to adversarial examples, where a small and often imperceptible adversarial perturbation can easily fool the state-of-the-art deep neural network classifiers (Szegedy et al., 2013; Nguyen et al., 2015; Goodfellow et al., 2014; Papernot et al., 2016) . Since then, a wide variety of deep learning tasks have been shown to also be vulnerable to adversarial attacks, ranging from various computer vision tasks to natural language processing tasks (Jia & Liang, 2017; Zhang et al., 2020; Jin et al., 2020; Alzantot et al., 2018) . Perhaps unsurprisingly, deep reinforcement learning (DRL) agents are also vulnerable to adversarial attacks, as first shown in (Huang et al., 2017) for atari games DRL agents. (Huang et al., 2017) study the effectiveness of adversarial examples on a policy network trained on Atari games under the situation where the attacker has access to the neural network of the victim policy. In (Lin et al., 2017) , the authors further investigate a strategically-timing attack when attacking victim agents on Atari games at a subset of the time-steps. Meanwhile, (Kos & Song, 2017) use the fast gradient sign method (FGSM) (Goodfellow et al., 2014) to generate adversarial perturbation on the A3C agents (Mnih et al., 2016) and explore training with random noise and FGSM perturbation to improve resilience against adversarial examples. While the above research endeavors focus on actions that take discrete values, another line of research tackles a more challenging problem on DRL with continuous action spaces (Weng et al., 2019; Gleave et al., 2019) . Specifically, (Weng et al., 2019) consider a two-step algorithm which determines adversarial perturbation to be closer to a targetted failure state using a learnt dynamics model, and (Gleave et al., 2019) propose a physically realistic threat model and demonstrate the existence of adversarial policies in zero-sum simulated robotics games. However, all the above works focused on the single DRL setting. While most of the existing DRL attack algorithms focus on the single DRL agent setting, in this work we propose to study the vulnerability of multi-agent DRL, which has been widely applied in many safety-critical real-world applications including swarm robotics (Dudek et al., 1993) , electricity distribution, and traffic control (OroojlooyJadid & Hajinezhad, 2019) . In particular, we focus on the collaborative multi-agent reinforcement learning (c-MARL) setting, where a group of agents is trained to generate joint actions to maximize the team reward. We note that c-MARL is a more challenging yet interesting setting than the single DRL agent setting, as now one also needs to consider the interactions between agents, which makes the problem becomes more complicated.

