BRIDGING THE GAP: PROVIDING POST-HOC SYMBOLIC EXPLANATIONS FOR SEQUENTIAL DECISION-MAKING PROBLEMS WITH INSCRUTABLE REPRESENTATIONS

Abstract

As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions. A significant hurdle to allowing for such explanatory dialogue could be the vocabulary mismatch between the user and the AI system. This paper introduces methods for providing contrastive explanations in terms of user-specified concepts for sequential decision-making settings where the system's model of the task may be best represented as a blackbox simulator. We do this by building partial symbolic models of a local approximation of the task that can be leveraged to answer the user queries. We empirically test these methods on a popular Atari game (Montezuma's Revenge) and modified versions of Sokoban (a well known planning benchmark) and report the results of user studies to evaluate whether people find explanations generated in this form useful.

1. INTRODUCTION

For AI systems to be truly effective in the real world, they need to not only be capable of coming up with near-optimal decisions but also be capable of working effectively with their end users. One of the key requirements for such collaboration would be to allow users to raise explanatory queries wherein they can contest the system's decisions. An obstacle to providing explanations to such questions is the fact that the systems may not have a shared vocabulary with its end users or have an explicit interpretable model of the task. More often than not, the system may be reasoning about the task in a high-dimensional space that is opaque to even the developers of the system, let alone a lay user. While there is a growing consensus within the explainable AI community that end-user explanations need to be framed in terms of user understandable concepts, the focus generally has been on introducing such methods for explaining one-shot decisions such as in the case of classifiers (c.f. Kim et al. ( 2018); Ribeiro et al. ( 2016)). This is unfortunate as explaining sequential decision-making problems presents many challenges that are absent from the one-shot decision-making scenarios. In such problems, we not only have to deal with possible interrelationship between the actions in the sequence, but may also need to explain conditions for the executability of actions and the cost of executing certain action sequences. Effectively, this means that explaining a plan or policy to a user would require the system to explain the details of the domain (or at least the agent's belief of it). In this paper, we propose methods that are able to field some of the most fundamental explanatory queries identified in the literature, namely contrastive queries, i.e., questions of the form 'why P (the decision proposed by the system) and not Q (the alternative proposed by the user or the foil)?' (Miller, 2018) , in user-understandable terms. Our methods achieve this by building partial and abstract symbolic models (Section 2) expressed in terms of the user's vocabulary that approximates task details relevant to the specific query raised by the user. To the best of our knowledge, we are the first work to propose learning of symbolic local approximations of the problem dynamics and cost function for explanations in sequential decision-making scenarios. Specifically, we will focus on

