KABEDONN: POSTHOC EXPLAINABLE ARTIFICIAL INTELLIGENCE WITH DATA ORDERED NEURAL NET-WORK

Abstract

Different approaches to eXplainable Artificial Intelligence (XAI) have been explored including (1) the systematic study of the effect of individual training data sample on the final model ( 2) posthoc attribution methods that assign importance values to the components of each data sample. Combining concepts from both approaches, we introduce kaBEDONN, a system of ordered dataset coupled with a posthoc and model-agnostic method for querying relevant training data samples. These relevant data are intended as the explanations for model predictions that are both user-friendly and easily adjustable by developers. Explanations can thus be finetuned and damage control can be performed with ease.

1. INTRODUCTION

Although machine learning (ML) algorithms are not expected to be perfect, their unexplained failures can be detrimental e.g. the well-known incident of 'racist ' algorithm bbc (2015) . EXplainable Artificial Intelligence (XAI) has emerged as an effort to help improve trust in the use of ML algorithms. It is a burgeoning field that has been recently studied from different aspects, such as (1) data influence on model training (2) post-hoc attribution methods (3) "signal methods" etc (some methods fall into multiple categories as seen in surveys like Arrieta et al. (2020); Gilpin et al. (2018) ; Tjoa & Guan (2020); Adadi & Berrada (2018) ). With improved trust, powerful blackbox models like the deep neural network (DNN) can be adopted into real applications with more accountability. Combining some of these existing concepts, we introduce k-width and Bifold Embedded Data Ordered Neural Network (kaBEDONN), which is a post-hoc XAI method to query relevant data as the explanation for a model prediction. All python codes are available in the supp. materials. Here, we consider the image classification task, including experiments on common image datasets MNIST, CIFAR10, ImageNet. Denote a sample data as (x, y0) ∈ D = X × Y where X is the input space and y0 ∈ Y the ground-truth class label. x is classified using some base model f as c = argmax i (y i ) where y = f (x) ∈ R C and C the number of classes/categories. The scenario considered in this paper is the following: users wish to know why f labels the sample as c i.e. they require explanations for the predictions. Like many XAI methods, kaBEDONN aims to provide a form of explanation. We start by clarifying our three objectives. Objective 1. Relevant data as explanations. The relevance of data has been measured in different ways. In Koh & Liang (2017) , a training data sample is considered either helpful or harmful to the prediction made by a trained model, quantified by the influence score. In Yeh et al. (2018) , data samples are either excitatory or inhibitory, in Pruthi et al. ( 2020) proponent or opponent. For kaBEDONN, relevant data samples strongly activate main nodes or sub-nodes, hence, they are excitatory in a different sense than Yeh et al. (2018) . Here, explanatory images are considered relevant when their features look "similar" to x according to the base model f . The explanatory images are then presented to users as shown in fig. 1(A ) and fig. 2(A) . More technically, we have three different contexts of "similar". (1) A representative data r is a training data sample that has been used to construct a main node in kaBEDONN. In this case, kaBEDONN stores the processed signals of r (also called "fingerprint" in (Tjoa & Cuntai, 2021)) and r's index w.r.t the ordered dataset in a main node. (2) A similar data s is a training data sample that has neither been included as a main node nor a sub-node because it is already well-represented by an existing main node. Only the index of s (but not its fingerprint) is stored in a well-represented (WR) node (that belongs to some main node). (3) A boundary data is a training data sample that is similar enough to a representative data r but is different (they have different class labels); this happens, for example, when similar-looking breeds of cats are labeled differently. Its fingerprint and index are stored in the sub-node of the main node constructed from r. Objective 2. Adjustment of Explanation for debugging. Explanations do not necessarily convince every user to the same degree. Suppose users flag some explanations as unsuitable. kaBEDONN is a system that allows developers to quickly finetune explanations based on the given feedback, primarily by adding, removing or reordering the underlying data samples used for kaBEDONN construction. This is useful because problematic data samples (e.g. accidentally mislabeled data and overly-representative data*) can be identified and removed while better explanations can be incorporated to the system when available. Remark. *An example of overly-representative data is an optical illusion; since it appears different from different point of views, it is not helpful as an explanation. A real user feedback example from ImageNet. In fig. 2(B ) panel 1, the image of interest (bullfrog image) activates a main node with the wrong class label (hammerhead shark). Furthermore, we found that a cartoon image of a hammerhead shark in the ImageNet dataset is associated with that node. Suppose a user considers it undesirable, we need the developer to quickly readjust the explanation. This is done by simply reordering the cartoon image (panel 2): we push it to the back of the queue by renaming the image. kaBEDONN is then reconstructed (panel 3), and a more "similar" representative is presented (the node appears to respond to partially dark background). We have deliberately chosen the unclear and ambiguous bullfrog image from ImageNet to demonstrate how problematic case can be handled. The result is thus not perfect; in practice, iterative user/developer feedback may be needed for better result. Also see appendix General info for more remarks. Objective 3. Predictive correctness flag for debugging. kaBEDONN is a posthoc explanatory model complementary to a more complex blackbox base model f . It is constructed using the collection {(x e , y0) : x e = f enc (x), (x, y0) ∈ D ′ ⊆ D} where x e is a latent vector obtained from encoder f enc . The encoder can be f enc = f itself or only the latent encoder part of f . In this paper, x e = y = CN N (x) i.e. f enc = f for the simplicity of demonstration. kaBEDONN can also act as a predictive model through multi-layered processing of latent vectors, partially using the universal approximation (UA) concept in Tjoa & Cuntai (2021) . While kaBEDONN no longer has the UA property from Tjoa & Cuntai (2021), the data fitting capability is still very high (see experiment and result section). The discrepancy between predictions made by kaBEDONN and the base model f (e.g ResNet) serves as a flag for user/developer to report abnormal prediction.



Figure 1: Inspired by influential images for explainable AI Koh & Liang (2017); Yeh et al. (2018); Pruthi et al. (2020), kaBEDONN provides explanations by querying relevant images from ordered data. (A) Sample images from the ImageNet dataset and the relevant data queried by kaBEDONN as explanations. (B) Data samples ordered by class y0 and position index idx. Data are queried in a deterministic order during kaBEDONN construction. (C) kaBEDONN with 3 layers and kwidth = 4. Bifold embedding: the first fold consists of layers of main nodes, the second "upper" fold consists of sub-nodes. WR nodes are not shown.

