EVALUATING ROBUSTNESS OF COOPERATIVE MARL: A MODEL-BASED APPROACH Anonymous authors Paper under double-blind review

Abstract

In recent years, a proliferation of methods were developed for cooperative multiagent reinforcement learning (c-MARL). However, the robustness of c-MARL agents against adversarial attacks has been rarely explored. In this paper, we propose to evaluate the robustness of c-MARL agents via a model-based approach, named c-MBA. Our proposed formulation can craft much stronger adversarial state perturbations of c-MARL agents to lower total team rewards than existing model-free approaches. In addition, we propose the first victim-agent selection strategy and the first data-driven approach to define targeted failure states where each of them allows us to develop even stronger adversarial attack without the expert knowledge to the underlying environment. Our numerical experiments on two representative MARL benchmarks illustrate the advantage of our approach over other baselines: our model-based attack consistently outperforms other baselines in all tested environments.

1. INTRODUCTION

Deep neural networks are known to be vulnerable to adversarial examples, where a small and often imperceptible adversarial perturbation can easily fool the state-of-the-art deep neural network classifiers (Szegedy et al., 2013; Nguyen et al., 2015; Goodfellow et al., 2014; Papernot et al., 2016) . Since then, a wide variety of deep learning tasks have been shown to also be vulnerable to adversarial attacks, ranging from various computer vision tasks to natural language processing tasks (Jia & Liang, 2017; Zhang et al., 2020; Jin et al., 2020; Alzantot et al., 2018) . Perhaps unsurprisingly, deep reinforcement learning (DRL) agents are also vulnerable to adversarial attacks, as first shown in (Huang et al., 2017) for atari games DRL agents. (Huang et al., 2017) study the effectiveness of adversarial examples on a policy network trained on Atari games under the situation where the attacker has access to the neural network of the victim policy. In (Lin et al., 2017) , the authors further investigate a strategically-timing attack when attacking victim agents on Atari games at a subset of the time-steps. Meanwhile, (Kos & Song, 2017) use the fast gradient sign method (FGSM) (Goodfellow et al., 2014) to generate adversarial perturbation on the A3C agents (Mnih et al., 2016) and explore training with random noise and FGSM perturbation to improve resilience against adversarial examples. While the above research endeavors focus on actions that take discrete values, another line of research tackles a more challenging problem on DRL with continuous action spaces (Weng et al., 2019; Gleave et al., 2019) . Specifically, (Weng et al., 2019) consider a two-step algorithm which determines adversarial perturbation to be closer to a targetted failure state using a learnt dynamics model, and (Gleave et al., 2019) propose a physically realistic threat model and demonstrate the existence of adversarial policies in zero-sum simulated robotics games. However, all the above works focused on the single DRL setting. While most of the existing DRL attack algorithms focus on the single DRL agent setting, in this work we propose to study the vulnerability of multi-agent DRL, which has been widely applied in many safety-critical real-world applications including swarm robotics (Dudek et al., 1993) , electricity distribution, and traffic control (OroojlooyJadid & Hajinezhad, 2019) . In particular, we focus on the collaborative multi-agent reinforcement learning (c-MARL) setting, where a group of agents is trained to generate joint actions to maximize the team reward. We note that c-MARL is a more challenging yet interesting setting than the single DRL agent setting, as now one also needs to consider the interactions between agents, which makes the problem becomes more complicated. where we name it c-MBA (Model-Based Attack on c-MARL). We formulate the attack into a two-step process and solve for adversarial state perturbation efficiently by existing proximal gradient methods. We show that our model-based attack is stronger and more effective than all of existing model-free baselines. Besides, we propose a novel adaptive victim selection strategy and show that it could further increase the attack power of c-MBA by decreasing the team reward even more. • To alleviate the dependence on the knowledge of the c-MARL environment, we also propose the first data-driven approach to define the targeted failure state based on the collected data for training the dynamics model. Our numerical experiments illustrate that c-MBA with the data-driven failure state is comparable and even outperforms c-MBA with the expert-defined failure state in many cases. Therefore, our data-driven approach is a good proxy to the optimal failure state when we have little or no knowledge about the state space of the c-MARL environments. • We show on both the multi-agent MuJoCo and multi-agent particle environments that our c-MBA consistently outperforms the SOTA baselines in all tested environments. We show that c-MBA can reduce the team reward up to 8-9× when attacking the c-MARL agents. In addition, c-MBA with the proposed victim selection strategy matches or even outperforms other c-MBA variants in all environments with up to 80% of improvement on reducing team reward. Paper outline. Section 2 discusses related works in adversarial attacks for DRL and present general background in c-MARL setting. We describe our proposed attack framework c-MBA in Section 3.1 for a fixed set of victim agents. In addition, we propose an alternative data-drive approach to determine the failure state for c-MBA in Section 3.2. After that, we detail an adaptive strategy to design a stronger attack by selecting the most vulnerable victim agents in Section 3.3. Section 4 presents the evaluation of our approach on several standard c-MARL benchmarks. Finally, we summarize our results and future directions in Section 5.

2. RELATED WORK AND BACKGROUND

Related work. Most of existing adversarial attacks on DRL agents are on single agent (Huang et al., 2017; Lin et al., 2017; Kos & Song, 2017; Weng et al., 2019) while there is only two other works (Lin et al., 2020; Hu & Zhang, 2022 ) that focus on the c-MARL setting. Whereas (Hu & Zhang, 2022) considers a different problem than ours where they want to find an optimally "sparse" attack by finding an attack with minimal attack steps, (Lin et al., 2020) proposes a two-step attack procedure to generate state perturbation for c-MARL setting which is the most relevant to our work.



Figure 1: We illustrate the proposed model-based attack is powerful while other model-free attacks failed on attacking Agent 0 in Ant (4x2) environment. The episode ends after 440 time steps under our c-MBA (the agent flips), demonstrating the effectiveness of our algorithm.

