BOUNDED MYOPIC ADVERSARIES FOR DEEP REINFORCEMENT LEARNING AGENTS

Abstract

Adversarial attacks against deep neural networks have been widely studied. Adversarial examples for deep reinforcement learning (DeepRL) have significant security implications, due to the deployment of these algorithms in many application domains. In this work we formalize an optimal myopic adversary for deep reinforcement learning agents. Our adversary attempts to find a bounded perturbation of the state which minimizes the value of the action taken by the agent. We show with experiments in various games in the Atari environment that our attack formulation achieves significantly larger impact as compared to the current state-of-the-art. Furthermore, this enables us to lower the bounds by several orders of magnitude on the perturbation needed to efficiently achieve significant impacts on DeepRL agents.

1. INTRODUCTION

Deep Neural Networks (DNN) have become a powerful tool and currently DNNs are widely used in speech recognition (Hannun et al., 2014 ), computer vision (Krizhevsky et al., 2012) , natural language processing (Sutskever et al., 2014) , and self learning systems as deep reinforcement learning agents (Mnih et al. (2015) , Mnih et al. (2016 ), Schulman et al. (2015) , Lillicrap et al. (2015) ). Along with the overwhelming success of DNNs in various domains there has also been a line of research investigating their weaknesses. Szegedy et al. (2014) observed that adding imperceptible perturbations to images can lead a DNN to misclassify the input image. The authors argue that the existence of these so called adversarial examples is a form of overfitting. In particular, they hypothesize that a very complicated neural network behaves well on the training set, but nonetheless, performs poorly on the testing set enabling exploitation by the attacker. However, they discovered different DNN models were misclassifying the same adversarial examples and assigning them the same class instead of making random mistakes. This led Goodfellow et al. (2015) to propose that the DNN models were actually learning approximately linear functions resulting in underfitting the data. 2017)). A more particular scenario where adversarial perturbations might be of significant interest is a financial trading market where the DeepRL agent is trained on observations consisting of the order book. In such a setting it is possible to compromise the whole trading system with an extremely small subset of adversaries. In particular, the 1 -norm bounded perturbations dicussed in our paper have sparse solutions, and thus can be used as a basis for an attack in such a scenario. Moreover, the magnitude of the 1 -norm bounded perturbations produced by our attack is orders of magnitude smaller than previous approaches, and thus our proposed perturbations result in a stealth attack more likely to evade automatic anomaly detection schemes.



Mnih et al. (2015)  introduced the use of DNNs as function approximators in reinforcement learning, improving the state of the art in this area. Because these deep reinforcement learning agents utilize DNNs, they are also susceptible to this type of adversarial examples. Currently, deep reinforcement learning has been applied to many areas such as network system control (Jay et al. (2019), Chu et al. (2020), Chinchali et al. (2018)), financial trading Noonan (2017), blockchain protocol security Hou et al. (2019), grid operation and security (Duan et al. (2019), Huang et al. (2019)), cloud computing Chen et al. (2018), robotics (Gu et al. (2017), Kalashnikov et al. (2018)), autonomous driving Dosovitsky et al. (2017), and medical treatment and diagnosis (Tseng et al. (2017), Popova et al. (2018), Thananjeyan et al. (2017), Daochang & Jiang (2018), Ghesu et al. (

