ON RELATING 'WHY?' AND 'WHY NOT?' EXPLANATIONS

Abstract

Explanations of Machine Learning (ML) models often address a 'Why?' question. Such explanations can be related with selecting feature-value pairs which are sufficient for the prediction. Recent work has investigated explanations that address a 'Why Not?' question, i.e. finding a change of feature values that guarantee a change of prediction. Given their goals, these two forms of explaining predictions of ML models appear to be mostly unrelated. However, this paper demonstrates otherwise, and establishes a rigorous formal relationship between 'Why?' and 'Why Not?' explanations. Concretely, the paper proves that, for any given instance, 'Why?' explanations are minimal hitting sets of 'Why Not?' explanations and vice-versa. Furthermore, the paper devises novel algorithms for extracting and enumerating both forms of explanations.

1. INTRODUCTION

The importance of devising mechanisms for computing explanations of Machine Learning (ML) models cannot be overstated, as illustrated by the fast-growing body of work in this area. A glimpse of the importance of explainable AI (XAI) is offered by a growing number of recent surveys and overviews Hoffman & Klein ( 2017 Another dimension of explanations, studied in recent work Miller (2019b) , is the difference between explanations for 'Why prediction π?' questions, e.g., 'Why did I get the loan?', and for 'Why prediction π and not δ?' questions, e.g., 'Why didn't I get the loan?'. Explanations for 'Why Not?' questions, labelled by Miller (2019b) contrastive explanations, isolate a pragmatic component of explanations that abductive explanations lack. Concretely, an abductive explanation identifies a set of feature values which are sufficient for the model to make a prediction π and thus provides an answer to the question 'Why π?' A constrastive explanation sets up a counterfactual link between what was a (possibly) desired outcome of a certain set of features and what was the observed outcome Bromberger (1962); Achinstein (1980) . Thus, a contrastive explanation answers a 'Why π and not δ?' question Miller ( 2018 (2020) can also be exploited for computing contrastive explanations. To our knowledge, this is new. In addition, we demonstrate that rigorous (model-based) local abductive and contrastive explanations are related by a minimal hitting set relationshipfoot_3 , which builds on the seminal work of Reiter in the 80s Reiter (1987) . Crucially, this novel hitting set relationship reveals a wealth of algorithms for computing and for enumerating contrastive and abductive explanations. We emphasize that it allows designing the first algorithm to enumerate abductive explanations. Finally, we demonstrate feasibility of our approach experimentally. Furthermore, our experiments show that there is a strong correlation between contrastive explanations and explanations produced by the commonly used SHAP explainer.

2. PRELIMINARIES

Explainability in Machine Learning. The paper assumes an ML model M, which is represented by a finite set of first-order logic (FOL) sentences M. (When applicable, simpler alternative representations for M can be considered, e.g. (decidable) fragments of FOL, (mixed-)integer linear programming, constraint language(s), etc.)foot_4 A set of features F = {f 1 , . . . , f L } is assumed. Each feature f i is categorical (or ordinal), with values taken from some set D i . An instance is an assignment of values to features. The space of instances, also referred to as feature (or instance) space, is defined by F = D 1 × D 2 × . . . × D L . (For real-valued features, a suitable interval discretization can be considered.) A (feature) literal λ i is of the form (f i = v i ), with v i ∈ D i . In what follows, a literal will be viewed as an atom, i.e. it can take value true or false. As a result, an instance can be viewed as a set of L literals, denoting the L distinct features, i.e. an instance contains a single occurrence of a literal defined on any given feature. A set of literals is consistent if it contains at most one literal defined on each feature. A consistent set of literals can be interpreted as a conjunction or as a disjunction of literals; this will be clear from the context. When interpreted as a conjunction, the set of literals denotes a cube in instance space, where the unspecified features can take any possible value of their domain. When interpreted as a disjunction, the set of literals denotes a clause in instance space. As before, the unspecified features can take any possible value of their domain. The remainder of the paper assumes a classification problem with a set of classes K = {κ 1 , . . . , κ M }. A prediction π ∈ K is associated with each instance X ∈ F. Throughout this paper, an ML model M will be associated with some logical representation (or encoding), whose consistency depends on the (input) instance and (output) prediction. Thus, we define a predicate M ⊆ F × K, such that M(X, π) is true iff the input X is consistent with prediction π given the ML model Mfoot_5 . We further simplify the notation by using M π (X) to denote a predicate M(X, π) for a concrete prediction π. Moreover, we will compute prime implicants of M π . These predicates defined on F and represented as consistent conjunctions (or alternatively as sets) of feature literals. Concretely, a consistent



A taxonomy of ML model explanations used in this paper is included in Appendix A. There is also a recent XAI service offered by Google: https://cloud.google.com/ explainable-ai/, inspired on similar ideas Google (2019). In contrast with recent work Ignatiev et al. (2019b), which studies the relationship between global modelbased (abductive) explanations and adversarial examples. A local abductive (resp. contrastive) explanation is a minimal hitting set of the set of all local contrastive (resp. abductive) explanations. M is referred to as the (formal) model of the ML model M. The use of FOL is not restrictive, with fragments of FOL being used in recent years for modeling ML models in different settings. These include NNs Ignatiev et al. (2019a) and Bayesian Network Classifiers Shih et al. (2019), among others. This alternative notation is used for simplicity and clarity with respect to earlier workShih et al. (2018); Ignatiev et al. (2019a;b). Furthermore, defining M as a predicate allows for multiple predictions for the same point in feature space. Nevertheless, such cases are not considered in this paper.



); Hoffman et al. (2017); Biran & Cotton (2017); Montavon et al. (2018); Klein (2018); Hoffman et al. (2018a); Adadi & Berrada (2018); Alonso et al. (2018); Dosilovic et al. (2018); Hoffman et al. (2018b); Guidotti et al. (2019); Samek et al. (2019); Samek & Müller (2019); Miller (2019b;a); Anjomshoae et al. (2019); Mittelstadt et al. (2019); Xu et al. (2019). Past work on computing explanations has mostly addressed local (or instance-dependent) explanations Ribeiro et al. (2016); Lundberg & Lee (2017); Ribeiro et al. (2018); Shih et al. (2018; 2019); Ignatiev et al. (2019a); Darwiche & Hirth (2020); Darwiche (2020). Exceptions include for example approaches that distill ML models, e.g. the case of NNs Frosst & Hinton (2017) among many others Ribeiro et al. (2016), or recent work on relating explanations with adversarial examples Ignatiev et al. (2019b), both of which can be seen as seeking global (or instance-independent) explanations. Prior research has also mostly considered model-agnostic explanations Ribeiro et al. (2016); Lundberg & Lee (2017); Ribeiro et al. (2018). Recent work on model-based explanations, e.g. Shih et al. (2018); Ignatiev et al. (2019a), refers to local (or global) model-agnostic explanations as heuristic, given that these approaches offer no formal guarantees with respect to the underlying ML model 1 . Examples of heuristic approaches include Ribeiro et al. (2016); Lundberg & Lee (2017); Ribeiro et al. (2018), among many others 2 . In contrast, local (or global) model-based explanations are referred to as rigorous, since these offer the strongest formal guarantees with respect to the underlying ML model. Concrete examples of such rigorous approaches include Shih et al. (2018); Tran & d'Avila Garcez (2018); Shih et al. (2019); Ignatiev et al. (2019a;b); Darwiche & Hirth (2020); Jha et al. (2019). Most work on computing explanations aims to answer a 'Why prediction π?' question. Some work proposes approximating the ML model's behavior with a linear model Ribeiro et al. (2016); Lundberg& Lee (2017). Most other work seeks to find a (often minimal) set of feature value pairs which is sufficient for the prediction, i.e. as long as those features take the specified values, the prediction does not change. For rigorous approaches, the answer to a 'Why prediction π?' question has been referred to asPI-explanations Shih et al. (2018; 2019  ), abductive explanations Ignatiev et al. (2019a)), but also as (minimal) sufficient reasons Darwiche & Hirth (2020); Darwiche (2020). (Hereinafter, we use the term abductive explanation because of the other forms of explanations studied in the paper.)

); Dhurandhar et al. (2018); Mittelstadt et al. (2019). In this paper we focus on the relationship between local abductive and contrastive explanations 3 . One of our contributions is to show how recent approaches for computing rigorous abductive explanations Shih et al. (2018; 2019); Ignatiev et al. (2019a); Darwiche & Hirth (2020); Darwiche

