EXPLAINABILITY OF DEEP REINFORCEMENT LEARN-ING ALGORITHMS IN ROBOTIC DOMAINS BY USING LAYER-WISE RELEVANCE PROPAGATION

Abstract

A key component to the recent success of reinforcement learning is the introduction of neural networks for representation learning. Doing so allows for solving challenging problems in several domains, one of which is robotics. However, a major criticism of deep reinforcement learning (DRL) algorithms is their lack of explainability and interpretability. This problem is even exacerbated in robotics as they oftentimes cohabitate space with humans, making it imperative to be able to reason about their behaviour. In this paper, we propose to analyze the learned representation in a robotic setting by utilizing graph neural networks. Using the graphical neural networks and Layer-wise Relevance Propagation (LRP), we represent the observations as an entity-relationship to allow us to interpret the learned policy. We evaluate our approach in two environments in MuJoCo. These two environments were delicately designed to effectively measure the value of knowledge gained by our approach to analyzing learned representations. This approach allows us to analyze not only how different parts of the observation space contribute to the decision-making process but also differentiate between policies and their differences in performance. This difference in performance also allows for reasoning about the agent's recovery from faults. These insights are key contributions to explainable deep reinforcement learning in robotic settings.

1. INTRODUCTION

While Deep Reinforcement Learning (DRL) has shown tremendous success in domains like games, highly structured robotic settings, and other real-world domains, it is still held back by concerns over its safety and explainability. Due to DRL leveraging non-linear function approximators (i.e., neural networks), its behaviour cannot be fully understood and anticipated. Especially in domains where DRL is deployed alongside humans, it is expected to perform as anticipated by those humans. To fully harness the potential that comes from this powerful technique, it is paramount to translate the internal state of DRL approaches into human-understandable signals. A Reinforcement Learning (RL) agent interacts with the environment to gain knowledge and learn to perform. It observes the environment, takes action accordingly, receives feedback, and updates its behavior based on the feedback. This self-training ability makes RL a complex learning procedure, causing many challenges in interpreting its behavior. Combining RL with the representation learning power of Deep Learning (DL) models further adds to this complexity. Explaining the policy learned by a black box DL model as a function approximator is one of the major challenges in interpreting DRL models. One method proposed to tackle this challenge is State Representation Learning (SRL). SRL is a feature learning method that learns a low-dimensional representation of the state from highdimensional raw observations (like pixels of an image) by capturing the variation in the environment caused by the agent's actions. (Lesort et al., 2018; Doncieux et al., 2018; Raffin et al., 2018; 2019; Traoré et al., 2019; Doncieux et al., 2020) . While SRL methods identify the most relevant features of a high-dimensional observation for learning to act and compact the observation accordingly, we still require highlighting the most relevant features in low-dimensional compact observation space robotic environments. This work aims at identifying the relevance of each entity of a robot in the decision-making process in low-dimensional sensory input robotic environments in which observation space is as compact as possible. Compact observation means that removing any part of the observation space would lead to a drop in performance. Saliency methods have proved to be successful in highlighting the most relevant pixels in image classification (Simonyan et al., 2013; Bach et al., 2015; Zhou et al., 2016; Selvaraju et al., 2017; Zhang et al., 2018a; b) , and entities and relations in graph classification (Baldassarre & Azizpour, 2019; Pope et al., 2019) . Some work extended the application of saliency methods from classification to RL, focusing on environments with visual data as states. One example of this application is explaining the DRL agent's behavior in Atari games by visualizing its decisions (Weitkamp et al., 2018; Greydanus et al., 2018; Iyer et al., 2018; Huber et al., 2019) . Nevertheless, saliency methods in RL have only been applied to RL problems with visual input states. In our work, a saliency method is used to highlight the contribution of each part of the robot to the policy, which helps us identify the most and least contributing parts to the decision-making. Since the structure of a robot is similar to graphs, we represent the robot's state using graphs and apply a saliency technique to highlight the contribution of each part of the graph to the agent's decisions. Baldassarre & Azizpour (2019) claim that Layer-wise Relevance Propagation (LRP) proves to be efficient in identifying the most contributing parts of a graph to a graph classification task. Based on this claim, we choose LRP as our saliency method. First, we need to use graph neural networks (GNN) as function approximators in our DRL algorithm. After the agent's performance converges, we apply LRP to identify the most contributing components of the robot to learning the task. A robot contains some number of body parts that are connected through joints. The body and joint in the robot correspond to the node and edge in the graph, respectively. Graph representation is used to decompose the robot into its entities and their relationships. This kind of representation in DRL has been previously used by Sanchez-Gonzalez et al. (2018) and Wang et al. (2018) . The LRP highlights the relevance of every action element in the output to each entity of the observation graph, creating a heat map of action-entity relevance, based on which we can distinguish the most relevant parts from the least relevant ones. Knowing the contribution of every entity in the robot to the decision-making process is highly important. One application is to provide a visualization for explaining the training process, which can be done by identifying the robot's entities contributing to learning a task during the training process. To get an intuition, assume a child is learning to stand up. During the initial stages of learning, they use their hands as assistance; however, in later stages, they can stand up easily without using their hands. Therefore, during the early stages of training, the contribution score of both hands and legs would be high, while during later stages, the contribution of hands drops. Another application is during a malfunction, where part of a robot is broken. Knowing the importance of the broken part helps us figure out how severe the damage is and whether the agent can recover from that malfunction or not. This recovery can be in the form of learning a new policy from scratch for the new dynamics or transferring the policy trained in the previous dynamics. If we choose to adapt to the new dynamics after a malfunction, this method can explain the adaptation process. To have better intuition, imagine the human's writing task. A right-handed human breaks their right hand; after that, they start using their left hand instead. In the first dynamics, the contribution of their right hand is the highest in the writing task. In contrast, in the second one, after adaptation to the new dynamics, the importance of their left hand escalates while the right hand's importance drops.

2.1. GRAPH NEURAL NETWORK

Graphs are tools that are used to represent structured data. A graph contains multiple entities with relationships among them. Entities and their relationships are shown using nodes and edges of the graph, respectively, which gives us flexibility in designing representation architectures of arbitrary shapes. Furthermore, this way of knowledge representation emphasizes the location of each entity relative to other entities. Graph neural networks (GNN) are neural networks that operate on graph inputs. GNNs impose constraints on relationships and interactions among entities while finding the optimal solution. In other words, they emphasize the relational inductive bias. Our GNN architecture and operations are according to Battaglia et al. (2018) . In an input graph, there are three kinds

