DAYLIGHT: ASSESSING GENERALIZATION SKILLS OF DEEP REINFORCEMENT LEARNING AGENTS

Abstract

Deep reinforcement learning algorithms have recently achieved significant success in learning high-performing policies from purely visual observations. The ability to perform end-to-end learning from raw high dimensional input alone has led to deep reinforcement learning algorithms being deployed in a variety of fields. Thus, understanding and improving the ability of deep reinforcement learning agents to generalize to unseen data distributions is of critical importance. Much recent work has focused on assessing the generalization of deep reinforcement learning agents by introducing specifically crafted adversarial perturbations to their inputs. In this paper, we propose another approach that we call daylight: a framework to assess the generalization skills of trained deep reinforcement learning agents. Rather than focusing on worst-case analysis of distribution shift, our approach is based on black-box perturbations that correspond to semantically meaningful changes to the environment or the agent's visual observation system ranging from brightness to compression artifacts. We demonstrate that even the smallest changes in the environment cause the performance of the agents to degrade significantly in various games from the Atari environment despite having orders of magnitude lower perceptual similarity distance compared to state-of-the-art adversarial attacks. We show that our framework captures a diverse set of bands in the Fourier spectrum, giving a better overall understanding of the agent's generalization capabilities. We believe our work can be crucial towards building resilient and generalizable deep reinforcement learning agents.

1. INTRODUCTION

Following the initial work of Mnih et al. (2015) , the use of DNNs as function approximators in reinforcement learning has led to a dramatic increase in the capabilities of RL agents Schulman et al. (2017) ; Lillicrap et al. (2015) . In particular, these developments allow for the direct learning of strong policies from raw, high-dimensional inputs such as visual observations. With the successes of these new methods come new challenges regarding the robustness and generalization capabilities of deep RL agents. (2017) ; Kos & Song (2017) . Since one of the main reasons for the success and popularity of deep RL is the ability to learn directly from visual observations alone, this non-robustness to small adversarial perturbations is a serious concern Chokshi (2020); Vlasic & Boudette (2016); Kunkle (2018). However, existing adversarial formulations for deep reinforcement learning require high computational effort to produce the perturbations, knowledge of the network used to train the agent, knowledge of the environment, real-time access to and manipulation of the agent's state observations. In this paper, we propose a more realistic scenario where we do not have access to any of the above, and the adversary essentially consists of realistic changes in the natural environment or in the agent's observation system. For instance, if our deep reinforcement learning agent is operating a self-driving car one could plausibly expect changes in daylight levels, shifts in angle due to terrain, fog on the camera lens, or compression artifacts from the camera processor. We believe that our proposed framework is semantically more meaningful than arbitrary p -norm bounded pixel perturbations. Prior work on image classification Dodge & Karam (2016) showed that image quality distortions can reduce the accuracy of DNN classfiers. Moreover, recent work by Ford et al. (2019) showed that while adversarial training for image classifiers reduced their vulnerability towards perturbations corresponding to high frequency in the Fourier domain, it actually made the models more vulnerable to low frequency perturbations including fog and contrast changes. Therefore, it is important to investigate model robustness and generalization throughout different bands in the frequency domain. We believe that being able to accurately assess the generalization capabilities of deep reinforcement learning agents is an initial step towards building robust and reliable agents. For these reasons, in this work we investigate the robustness of trained deep reinforcement learning agents and make the following contributions: • We propose a realistic threat model called daylight and a generalization framework for deep reinforcement learning agents that aims to assess the robustness of the agents to basic environmental and observational changes. • We run multiple experiments in the Atari environment in various games to demonstrate the degradation in performance of deep reinforcement learning agents. • We compare our threat model with the state-of-the-art adversarial method based on p -norm changes, and we show that our daylight framework results in competitive, and almost always larger impact, with lower perceptual similarity distance. • We evaluate the daylight framework in the time domain and show that several works based on the timing perspective of adversarial formulations can be revisited within our daylight framework. • Finally, we investigate the frequency domain of our framework and state-of-the-art targeted attacks. We show that our framework captures different bands of the frequency spectrum, thus yielding a better estimate of the model robustness.

2.1. PRELEMINARIES

In this paper we consider Markov Decision Processes (MDPs) given by a tuple (S, A, P, r, γ, s 0 ). The reinforcement learning agent interacts with the MDP by observing states s ∈ S, and then taking actions a ∈ A. The probability of transitioning to state s when the agent takes action a in state s is determined by the transition probability P : S × A × S → R. The reward received by the agent when taking action a in state s is given by the reward function r : S × A → R. The goal of the agent is to learn a policy π θ : S × A → R which takes an action a in state s that maximizes the cumulative discounted reward T -1 t=0 γ t r(s t , a t ). Here s 0 is the initial state of the agent, and γ is the discount factor. For deep Q networks (DQN) the optimal policy is determined by learning the state-action value function Q(s, a). For a state s we use F(s) to denote the 2D discrete Fourier transform. et al. (2014) proposed to minimize the distance between the original image and adversarially produced image to create adversarial perturbations. The authors used box-constrained L-BFGS to solve this optimization problem. Goodfellow et al. (2015) introduced the fast gradient method (FGM) 

Szegedy

x adv = x + • ∇ x J(x, y) ||∇ x J(x, y)|| p ,



One line of research has focused on the high sensitivity of deep neural networks to imperceptible, adversarial perturbations to visual inputs, first in the setting of image classification Szegedy et al. (2014); Goodfellow et al. (2015) and more recently for deep reinforcement learning Huang et al.

for crafting adversarial examples in image classification by taking the gradient of the cost function J(x, y) used to train the neural network in the direction of the input, where x is the input, y is the output label, and J(x, y) is the cost function for image classification. Carlini & Wagner (2017) introduced targeted attacks in the image classification domain based on distance minimization between the adversarial image and the original image while targeting a particular label. In the deep reinforcement learning domain the Carlini & Wagner (2017) formulation is

