EXPLAINABILITY OF DEEP REINFORCEMENT LEARN-ING ALGORITHMS IN ROBOTIC DOMAINS BY USING LAYER-WISE RELEVANCE PROPAGATION

Abstract

A key component to the recent success of reinforcement learning is the introduction of neural networks for representation learning. Doing so allows for solving challenging problems in several domains, one of which is robotics. However, a major criticism of deep reinforcement learning (DRL) algorithms is their lack of explainability and interpretability. This problem is even exacerbated in robotics as they oftentimes cohabitate space with humans, making it imperative to be able to reason about their behaviour. In this paper, we propose to analyze the learned representation in a robotic setting by utilizing graph neural networks. Using the graphical neural networks and Layer-wise Relevance Propagation (LRP), we represent the observations as an entity-relationship to allow us to interpret the learned policy. We evaluate our approach in two environments in MuJoCo. These two environments were delicately designed to effectively measure the value of knowledge gained by our approach to analyzing learned representations. This approach allows us to analyze not only how different parts of the observation space contribute to the decision-making process but also differentiate between policies and their differences in performance. This difference in performance also allows for reasoning about the agent's recovery from faults. These insights are key contributions to explainable deep reinforcement learning in robotic settings.

1. INTRODUCTION

While Deep Reinforcement Learning (DRL) has shown tremendous success in domains like games, highly structured robotic settings, and other real-world domains, it is still held back by concerns over its safety and explainability. Due to DRL leveraging non-linear function approximators (i.e., neural networks), its behaviour cannot be fully understood and anticipated. Especially in domains where DRL is deployed alongside humans, it is expected to perform as anticipated by those humans. To fully harness the potential that comes from this powerful technique, it is paramount to translate the internal state of DRL approaches into human-understandable signals. A Reinforcement Learning (RL) agent interacts with the environment to gain knowledge and learn to perform. It observes the environment, takes action accordingly, receives feedback, and updates its behavior based on the feedback. This self-training ability makes RL a complex learning procedure, causing many challenges in interpreting its behavior. Combining RL with the representation learning power of Deep Learning (DL) models further adds to this complexity. Explaining the policy learned by a black box DL model as a function approximator is one of the major challenges in interpreting DRL models. One method proposed to tackle this challenge is State Representation Learning (SRL). SRL is a feature learning method that learns a low-dimensional representation of the state from highdimensional raw observations (like pixels of an image) by capturing the variation in the environment caused by the agent's actions. (Lesort et al., 2018; Doncieux et al., 2018; Raffin et al., 2018; 2019; Traoré et al., 2019; Doncieux et al., 2020) . While SRL methods identify the most relevant features of a high-dimensional observation for learning to act and compact the observation accordingly, we still require highlighting the most relevant features in low-dimensional compact observation space robotic environments. 1

