BRIDGING THE GAP: PROVIDING POST-HOC SYMBOLIC EXPLANATIONS FOR SEQUENTIAL DECISION-MAKING PROBLEMS WITH INSCRUTABLE REPRESENTATIONS

Abstract

As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions. A significant hurdle to allowing for such explanatory dialogue could be the vocabulary mismatch between the user and the AI system. This paper introduces methods for providing contrastive explanations in terms of user-specified concepts for sequential decision-making settings where the system's model of the task may be best represented as a blackbox simulator. We do this by building partial symbolic models of a local approximation of the task that can be leveraged to answer the user queries. We empirically test these methods on a popular Atari game (Montezuma's Revenge) and modified versions of Sokoban (a well known planning benchmark) and report the results of user studies to evaluate whether people find explanations generated in this form useful.

1. INTRODUCTION

For AI systems to be truly effective in the real world, they need to not only be capable of coming up with near-optimal decisions but also be capable of working effectively with their end users. One of the key requirements for such collaboration would be to allow users to raise explanatory queries wherein they can contest the system's decisions. An obstacle to providing explanations to such questions is the fact that the systems may not have a shared vocabulary with its end users or have an explicit interpretable model of the task. More often than not, the system may be reasoning about the task in a high-dimensional space that is opaque to even the developers of the system, let alone a lay user. While there is a growing consensus within the explainable AI community that end-user explanations need to be framed in terms of user understandable concepts, the focus generally has been on introducing such methods for explaining one-shot decisions such as in the case of classifiers (c.f. Kim et al. (2018); Ribeiro et al. (2016) ). This is unfortunate as explaining sequential decision-making problems presents many challenges that are absent from the one-shot decision-making scenarios. In such problems, we not only have to deal with possible interrelationship between the actions in the sequence, but may also need to explain conditions for the executability of actions and the cost of executing certain action sequences. Effectively, this means that explaining a plan or policy to a user would require the system to explain the details of the domain (or at least the agent's belief of it). In this paper, we propose methods that are able to field some of the most fundamental explanatory queries identified in the literature, namely contrastive queries, i.e., questions of the form 'why P (the decision proposed by the system) and not Q (the alternative proposed by the user or the foil)?' (Miller, 2018) , in user-understandable terms. Our methods achieve this by building partial and abstract symbolic models (Section 2) expressed in terms of the user's vocabulary that approximates task details relevant to the specific query raised by the user. To the best of our knowledge, we are the first work to propose learning of symbolic local approximations of the problem dynamics and cost function for explanations in sequential decision-making scenarios. Specifically, we will focus on deterministic tasks where the system has access to a task simulator and we will identify (a) missing preconditions to explain scenarios where the foil raised by the user results in an execution failure of action and (b) cost function approximations to explain cases where the foil is executable but suboptimal (Section 3). We learn such models by interacting with the simulator (on randomly sampled states) while using learned classifiers that detect the presence of user-specified concepts in the simulator states. Figure 1 presents the overall flow of this process with illustrative explanations in the context of a slightly updated version of Montezuma's Revenge (Wikipedia contributors, 2019). Our methods also allow for the calculation of confidence over the explanations and explicitly take into account the fact that learned classifiers for user-specified concepts may be noisy. This ability to quantify its belief about the correctness of explanation is an important capability for any post-hoc explanation system that may influence the user's decisions. We evaluate the system on two popular sequential decision-making domains, Montezuma's Revenge and modified versions of Sokoban (Botea et al., 2002) (a game involving players pushing boxes to specified targets). We present user study results that show the effectiveness of explanations studied in this paper (Section 5).

2. BACKGROUND

Our focus here is to address cases where a user is trying to make sense of agent behavior that may in fact be optimal in the agent's model of the task but is confusing to the user owing to a differing understanding of the task (or them overlooking some facts about the task). Figure 1 : Explanatory dialogue starts when the user presenting a specific alternate plan (foil). We consider two foils, one that is invalid and another that is costlier than the plan. System explains the invalid plan by pointing out an action precondition that was not met in the plan, while it explains the foil suboptimality by informing the user about cost function. These model information are expressed in terms of concepts specified by the user which we operationalize by learning classifiers for each concept. Thus our focus isn't on how the agent came up with the specific decisions, but only on why this action sequence was chosen instead of an alternative that the user expected. We assume agent has access to a deterministic simulator of the form M sim = S, A, T, C , where S represents the set of possible world states, A the set of actions and T the transition function that specifies the task dynamics. The transition function is defined as T : S ×A → S ∪{⊥}, where ⊥ corresponds to an invalid absorber-state generated by the execution of an infeasible action. Invalid state could be used to capture failure states that could occur when the agent violates hard constraints like safety constraints. Finally, C : S × A → R captures the cost function of executing an action at a particular state (with the cost of an infeasible action taken to be infinite). We will overload the transition function and cost functions to also take in a sequence of actions (which in the case of the transition function returns the final state resulting from executing the action sequence and for cost function the cost of executing the sequence actions). We will consider goal-directed agents that are trying to drive the state of the world to one of the goal states. Where the tuple Π sim = I, G, M sim represents the agent's decision making problem (I is the initial state and G the set of goal states). The agent comes up with a plan (a sequence of actions) π such that T (I, π) ∈ G and the plan is said to be optimal if there exists no cheaper plan that achieves the goal. We will use symbolic action models with preconditions and cost functions (similar to STRIPS models (Geffner & Bonet, 2013) ) as a way to approximate the problem for explanations. Such a model can be represented by the tuple M S = F S , A S , I S , G S , C S , where F S is a set of propositional fluents defining the state space, A S is the set of actions, I S is the initial state, G S is the goal specification. Each valid problem state in the problem is uniquely identified by the subset of fluents that are true in that state (so for any state s ∈ S M S , where S M S is the set of states for M S , s ⊆ F S ). Each action a ∈ A S is further described in terms of the preconditions prec a (specification of states in which a is executable) and the effects of executing the action. We will denote the state formed by executing action a in a state s as a(s). We will focus on models where the preconditions are represented as a conjunction of state factors. If the action is executed in a state with missing preconditions, then the execution results in the invalid state (⊥). Unlike standard

