KABEDONN: POSTHOC EXPLAINABLE ARTIFICIAL INTELLIGENCE WITH DATA ORDERED NEURAL NET-WORK

Abstract

Different approaches to eXplainable Artificial Intelligence (XAI) have been explored including (1) the systematic study of the effect of individual training data sample on the final model ( 2) posthoc attribution methods that assign importance values to the components of each data sample. Combining concepts from both approaches, we introduce kaBEDONN, a system of ordered dataset coupled with a posthoc and model-agnostic method for querying relevant training data samples. These relevant data are intended as the explanations for model predictions that are both user-friendly and easily adjustable by developers. Explanations can thus be finetuned and damage control can be performed with ease.

1. INTRODUCTION

Although machine learning (ML) algorithms are not expected to be perfect, their unexplained failures can be detrimental e.g. the well-known incident of 'racist ' algorithm bbc (2015) . EXplainable Artificial Intelligence (XAI) has emerged as an effort to help improve trust in the use of ML algorithms. It is a burgeoning field that has been recently studied from different aspects, such as (1) data influence on model training (2) post-hoc attribution methods (3) "signal methods" etc (some methods fall into multiple categories as seen in surveys like Arrieta et al. (2020); Gilpin et al. (2018) ; Tjoa & Guan (2020); Adadi & Berrada (2018) ). With improved trust, powerful blackbox models like the deep neural network (DNN) can be adopted into real applications with more accountability. Combining some of these existing concepts, we introduce k-width and Bifold Embedded Data Ordered Neural Network (kaBEDONN), which is a post-hoc XAI method to query relevant data as the explanation for a model prediction. All python codes are available in the supp. materials. Here, we consider the image classification task, including experiments on common image datasets MNIST, CIFAR10, ImageNet. Denote a sample data as (x, y0) ∈ D = X × Y where X is the input space and y0 ∈ Y the ground-truth class label. x is classified using some base model f as c = argmax i (y i ) where y = f (x) ∈ R C and C the number of classes/categories. The scenario considered in this paper is the following: users wish to know why f labels the sample as c i.e. they require explanations for the predictions. Like many XAI methods, kaBEDONN aims to provide a form of explanation. We start by clarifying our three objectives. Objective 1. Relevant data as explanations. The relevance of data has been measured in different ways. In Koh & Liang (2017) , a training data sample is considered either helpful or harmful to the prediction made by a trained model, quantified by the influence score. In Yeh et al. (2018) , data samples are either excitatory or inhibitory, in Pruthi et al. (2020) proponent or opponent. For kaBEDONN, relevant data samples strongly activate main nodes or sub-nodes, hence, they are excitatory in a different sense than Yeh et al. (2018) . Here, explanatory images are considered relevant when their features look "similar" to x according to the base model f . The explanatory images are then presented to users as shown in fig. 1(A ) and fig. 2(A) . More technically, we have three different contexts of "similar". (1) A representative data r is a training data sample that has been used to construct a main node in kaBEDONN. In this case, kaBEDONN stores the processed signals of r (also called "fingerprint" in (Tjoa & Cuntai, 2021)) and r's index w.r.t the ordered dataset in a main node. (2) A similar data s is a training data sample that has neither been included as a main node nor a sub-node because it is already well-represented

