DAYLIGHT: ASSESSING GENERALIZATION SKILLS OF DEEP REINFORCEMENT LEARNING AGENTS

Abstract

Deep reinforcement learning algorithms have recently achieved significant success in learning high-performing policies from purely visual observations. The ability to perform end-to-end learning from raw high dimensional input alone has led to deep reinforcement learning algorithms being deployed in a variety of fields. Thus, understanding and improving the ability of deep reinforcement learning agents to generalize to unseen data distributions is of critical importance. Much recent work has focused on assessing the generalization of deep reinforcement learning agents by introducing specifically crafted adversarial perturbations to their inputs. In this paper, we propose another approach that we call daylight: a framework to assess the generalization skills of trained deep reinforcement learning agents. Rather than focusing on worst-case analysis of distribution shift, our approach is based on black-box perturbations that correspond to semantically meaningful changes to the environment or the agent's visual observation system ranging from brightness to compression artifacts. We demonstrate that even the smallest changes in the environment cause the performance of the agents to degrade significantly in various games from the Atari environment despite having orders of magnitude lower perceptual similarity distance compared to state-of-the-art adversarial attacks. We show that our framework captures a diverse set of bands in the Fourier spectrum, giving a better overall understanding of the agent's generalization capabilities. We believe our work can be crucial towards building resilient and generalizable deep reinforcement learning agents.

1. INTRODUCTION

Following the initial work of Mnih et al. (2015) , the use of DNNs as function approximators in reinforcement learning has led to a dramatic increase in the capabilities of RL agents Schulman et al. (2017); Lillicrap et al. (2015) . In particular, these developments allow for the direct learning of strong policies from raw, high-dimensional inputs such as visual observations. With the successes of these new methods come new challenges regarding the robustness and generalization capabilities of deep RL agents. 2017). Since one of the main reasons for the success and popularity of deep RL is the ability to learn directly from visual observations alone, this non-robustness to small adversarial perturbations is a serious concern Chokshi (2020); Vlasic & Boudette (2016); Kunkle (2018). However, existing adversarial formulations for deep reinforcement learning require high computational effort to produce the perturbations, knowledge of the network used to train the agent, knowledge of the environment, real-time access to and manipulation of the agent's state observations. In this paper, we propose a more realistic scenario where we do not have access to any of the above, and the adversary essentially consists of realistic changes in the natural environment or in the agent's observation system. For instance, if our deep reinforcement learning agent is operating a self-driving car one could plausibly expect changes in daylight levels, shifts in angle due to terrain, fog on the camera lens, or compression artifacts from the camera processor. We believe that our proposed framework is semantically more meaningful than arbitrary p -norm bounded pixel perturbations.



One line of research has focused on the high sensitivity of deep neural networks to imperceptible, adversarial perturbations to visual inputs, first in the setting of image classification Szegedy et al. (2014); Goodfellow et al. (2015) and more recently for deep reinforcement learning Huang et al. (2017); Kos & Song (

