INTERACTIVE VISUALIZATION FOR DEBUGGING RL

Abstract

Visualization tools for supervised learning (SL) allow users to interpret, introspect, and gain an intuition for the successes and failures of their models. While reinforcement learning (RL) practitioners ask many of the same questions while debugging agent policies, existing tools aren't a great fit for the RL setting as these tools address challenges typically found in the SL regime. Whereas SL involves a static dataset, RL often entails collecting new data in challenging environments with partial observability, stochasticity, and non-stationary data distributions. This necessitates the creation of alternate visual interfaces to help us better understand agent policies trained using RL. In this work, we design and implement an interactive visualization tool for debugging and interpreting RL. Our system 1 identifies and addresses important aspects missing from existing tools such as (1) visualizing alternate state representations (different from those seen by the agent) that researchers could use while debugging RL policies; (2) interactive interfaces tailored to metadata stored while training RL agents (3) a conducive workflow designed around RL policy debugging. We provide an example workflow of how this system could be used, along with ideas for future extensions.

1. INTRODUCTION

Machine learning systems have made impressive advances due to their ability to learn high dimensional models from large amounts of data (LeCun et al., 2015) . However, high dimensional models are hard to understand and trust (Doshi-Velez & Kim, 2017) . Many tools exist for addressing this challenge in the supervised learning setting, which find usage in tracking metrics (Abadi et al., 2015; Satyanarayan et al., 2017) , generating graphs of model internals (Wongsuphasawat et al., 2018) , and visualizing embeddings (van der Maaten & Hinton, 2008) . However, there is no corresponding set of tools for the reinforcement learning setting. At first glance, it appears we may repurpose existing tools for this task. However, we quickly run into limitations, that arise due to the intent with which these tools were designed. Reinforcement learning (RL) is a more interactive science (Neftci & Averbeck, 2019) compared to supervised learning, due to a stronger feedback loop between the researcher and the agent. Whereas supervised learning involves a static dataset, RL often entails collecting new data. To fully understand an RL algorithm, we must understand the effect it has on the data collected. Note that in supervised learning, the learned model has no effect on a fixed dataset. Visualization systems are important for overcoming these challenges. At their core visualization systems, consist of two components: representation and interaction. Representation is concerned with how data is mapped to a representation and then rendered. Interaction is concerned with the dialog between the user and the system as the user explores the data to uncover insights (Yi et al., 2007) . Though appearing to be disparate, these two processes have a symbiotic influence on each other. The tools we use for representation affect how we interact with the system, and our interaction affects the representations that we create. Thus, while designing visualization systems, it is important to think about the application domain from which the data originates, in this case, reinforcement learning. Using existing tools we can plot descriptive metrics such as cumulative reward, TD-error, and action values, to name a few. However, it is harder to pose and easily answer questions such as: -How does the agent state-visitation distribution change as training progresses? -What effect do noteworthy, influential states have on the policy?



An interactive (anonymized) demo of the system can be found at https://vizarel-demo. herokuapp.com

