BOUNDED MYOPIC ADVERSARIES FOR DEEP REINFORCEMENT LEARNING AGENTS

Abstract

Adversarial attacks against deep neural networks have been widely studied. Adversarial examples for deep reinforcement learning (DeepRL) have significant security implications, due to the deployment of these algorithms in many application domains. In this work we formalize an optimal myopic adversary for deep reinforcement learning agents. Our adversary attempts to find a bounded perturbation of the state which minimizes the value of the action taken by the agent. We show with experiments in various games in the Atari environment that our attack formulation achieves significantly larger impact as compared to the current state-of-the-art. Furthermore, this enables us to lower the bounds by several orders of magnitude on the perturbation needed to efficiently achieve significant impacts on DeepRL agents.

1. INTRODUCTION

Deep Neural Networks (DNN) have become a powerful tool and currently DNNs are widely used in speech recognition (Hannun et al., 2014 ), computer vision (Krizhevsky et al., 2012) , natural language processing (Sutskever et al., 2014) , and self learning systems as deep reinforcement learning agents (Mnih et al. (2015) , Mnih et al. (2016) , Schulman et al. (2015) , Lillicrap et al. (2015) ). Along with the overwhelming success of DNNs in various domains there has also been a line of research investigating their weaknesses. Szegedy et al. (2014) observed that adding imperceptible perturbations to images can lead a DNN to misclassify the input image. The authors argue that the existence of these so called adversarial examples is a form of overfitting. In particular, they hypothesize that a very complicated neural network behaves well on the training set, but nonetheless, performs poorly on the testing set enabling exploitation by the attacker. However, they discovered different DNN models were misclassifying the same adversarial examples and assigning them the same class instead of making random mistakes. This led Goodfellow et al. (2015) to propose that the DNN models were actually learning approximately linear functions resulting in underfitting the data. Mnih et al. (2015) introduced the use of DNNs as function approximators in reinforcement learning, improving the state of the art in this area. Because these deep reinforcement learning agents utilize DNNs, they are also susceptible to this type of adversarial examples. Currently, deep reinforcement learning has been applied to many areas such as network system control (Jay et al. ( 2019 2017)). A more particular scenario where adversarial perturbations might be of significant interest is a financial trading market where the DeepRL agent is trained on observations consisting of the order book. In such a setting it is possible to compromise the whole trading system with an extremely small subset of adversaries. In particular, the 1 -norm bounded perturbations dicussed in our paper have sparse solutions, and thus can be used as a basis for an attack in such a scenario. Moreover, the magnitude of the 1 -norm bounded perturbations produced by our attack is orders of magnitude smaller than previous approaches, and thus our proposed perturbations result in a stealth attack more likely to evade automatic anomaly detection schemes.

Recent work by

Considering the wide spectrum of deep reinforcement learning algorithm deployment it is crucial to investigate the resilience of these algorithms before they are used in real world application domains. Moreover, adversarial formulations are a first step to understand these algorithms and build generalizable, reliable and robust deep reinforcement learning agents. Therefore, in this paper we study adversarial attack formulations for deep reinforcement learning agents and make the following contributions: • We define the optimal myopic adversary, whose aim is to minimize the value of the action taken by the agent in each state, and formulate the optimization problem that this adversary seeks to solve. • We introduce a differentiable approximation for the optimal myopic adversarial formulation. • We compare the impact results of our attack formulation to previous formulations in different games in the Atari environment. • We show that the new formulation finds a better direction for the adversarial perturbation and increases the attack impact for bounded perturbations. (Conversely, our formulation decreases the magnitude of the pertubation required to efficiently achieve a significant impact.) et al. (2015) introduced the fast gradient method (FGM) 2018) is in the choice of the cost function J used to determine the gradient direction. In the next section we will outline the different cost functions used in these two different formulations.

Goodfellow

x * = x + • ∇ x J(x, y) ||∇ x J(x, y)|| p ,

2.3. ADVERSARIAL ATTACK FORMULATIONS

In a bounded attack formulation for deep reinforcement learning, the aim is to try to find a perturbed state s adv in a ball



), Chu et al. (2020), Chinchali et al. (2018)), financial trading Noonan (2017), blockchain protocol security Hou et al. (2019), grid operation and security (Duan et al. (2019), Huang et al. (2019)), cloud computing Chen et al. (2018), robotics (Gu et al. (2017), Kalashnikov et al. (2018)), autonomous driving Dosovitsky et al. (2017), and medical treatment and diagnosis (Tseng et al. (2017), Popova et al. (2018), Thananjeyan et al. (2017), Daochang & Jiang (2018), Ghesu et al. (

2.1 ADVERSARIAL REINFORCEMENT LEARNING Adversarial reinforcement learning is an active line of research directed towards discovering the weaknesses of deep reinforcement learning algorithms. Gleave et al. (2020) model the interaction between the agent and the adversary as a two player Markov game and solve the reinforcement learning problem for the adversary via Proximal Policy Optimization introduced by Schulman et al. (2017). They fix the victim agent's policy and only allow the adversary to take natural actions to disrupt the agent instead of using p -norm bound pixel perturbations. Pinto et al. (2017) model the adversary and the victim as a two player zero-sum discounted Markov game and train the victim in the presence of the adversary to make the victim more robust. Mandlekar et al. (2017) use a gradient based perturbation to make the agent more robust as compared to random perturbations. Huang et al. (2017) and Kos & Song (2017) use the fast gradient sign method (FGSM) to show deep reinforcement learning agents are vulnarable to adversarial perturbations. Pattanaik et al. (2018) use a gradient based formulation to increase the robustness of deep reinforcement learning agents.

for crafting adversarial examples for image classification by taking the gradient of the cost function J(x, y) used to train the neural network in the direction of the input. Here x is the input and y is the output label for image classification. As mentioned in the previous section FGM was first adapted to the deep reinforcement learning setting byHuang et al. (2017). Subsequently, Pattanaik et al. (2018)   introduced a variant of FGM, in which a few random samples are taken in the gradient direction, and the best is chosen. However, the main difference between the approach of Huang et al. (2017) and Pattanaik et al. (

