CORTX: CONTRASTIVE FRAMEWORK FOR REAL-TIME EXPLANATION

Abstract

Recent advancements in explainable machine learning provide effective and faithful solutions for interpreting model behaviors. However, many explanation methods encounter efficiency issues, which largely limit their deployments in practical scenarios. Real-time explainer (RTX) frameworks have thus been proposed to accelerate the model explanation process by learning a one-feed-forward explainer. Existing RTX frameworks typically build the explainer under the supervised learning paradigm, which requires large amounts of explanation labels as the ground truth. Considering that accurate explanation labels are usually hard to obtain due to constrained computational resources and limited human efforts, effective explainer training is still challenging in practice. In this work, we propose a COntrastive Real-Time eXplanation (CoRTX) framework to learn the explanation-oriented representation and relieve the intensive dependence of explainer training on explanation labels. Specifically, we design a synthetic strategy to select positive and negative instances for the learning of explanation. Theoretical analysis show that our selection strategy can benefit the contrastive learning process on explanation tasks. Experimental results on three real-world datasets further demonstrate the efficiency and efficacy of our proposed CoRTX framework.

1. INTRODUCTION

The remarkable progress in explainable machine learning (ML) significantly improves the model transparency to human beings (Du et al., 2019) . However, applying explainable ML techniques to real-time scenarios remains to be a challenging task. Real-time systems typically require model explanation to be not only effective but also efficient (Stankovic et al., 1992) . Due to the requirements from both stakeholders and social regulations (Goodman & Flaxman, 2017; Floridi, 2019) , the efficient model explanation is necessary for the real-time ML systems, such as the controlling systems (Steel & Angwin, 2010) , online recommender systems (Yang et al., 2018) , and healthcare monitoring systems (Gao et al., 2017) . Nevertheless, existing work on non-amortized explanation methods has high explanation latency, including LIME (Ribeiro et al., 2016) , KernelSHAP (Lundberg & Lee, 2017) . These methods rely on either multiple perturbations or backpropagation in deep neural networks (DNN) for deriving explanation (Covert & Lee, 2021; Liu et al., 2021) , which is time-consuming and limited for deployment in real-time scenarios. Real-time explainer (RTX) frameworks have thus been proposed to address such efficiency issues and provide effective explanations for real-time systems (Dabkowski & Gal, 2017; Jethani et al., 2021b) . Specifically, RTX learns an overall explainer on the training set by using the ground-truth explanation labels obtained through either exact calculation or approximation. RTX provides the explanation for each local instance via a single feed-forward process. Existing efforts on RTX can be categorized into two lines of work. The first line (Schwab & Karlen, 2019; Jethani et al., 2021b; Covert et al., 2022) explicitly learns an explainer to minimize the estimation error regarding to the approximated explanation labels. The second line (Dabkowski & Gal, 2017; Chen et al., 2018; Kanehira & Harada, 2019) trains a feature mask generator subject to certain constraints on pre-defined label distribution. Despite the effectiveness of existing RTX frameworks, recent advancements still rely on the large amounts of explanation labels under the supervised learning paradigm. The computational cost of obtaining explanation labels is extremely high (Roth, 1988; Winter, 2002) , which thereby limits the RTX's deployment in real-world scenarios. To tackle the aforementioned challenges, we propose a COntrastive Real-Time eXplanation (CoRTX) framework based on the contrastive learning techniques. CoRTX aims to learn the latent explanation of each data instance without any ground-truth explanation label. The latent explanation of an instance is defined as a vector encoded with explanation information. Contrastive learning has been widely exploited for improving the learning processes of downstream tasks by providing well-pretrained representative embeddings (Arora et al., 2019; He et al., 2020) . In particular, task-oriented selection strategies of positive and negative pairs (Chen et al., 2020; Khosla et al., 2020) can shape the representation properties through contrastive learning. Motivated by the such contrastive scheme, CoRTX develops an explanation-oriented contrastive framework to learn the latent explanation, with the goal of further fine-tuning an explanation head in the downstream tasks. CoRTX learns the latent explanation to deal with the explanation tasks by minimizing the contrastive loss (Van den Oord et al., 2018) . Specifically, CoRTX designs a synthetic positive and negative sampling strategy to learn the latent explanation. The obtained latent explanation can then be transformed to feature attribution or ranking by fine-tuning a corresponding explanation head using a tiny amount of explanation labels. Theoretical analysis and experimental results demonstrate that CoRTX can successfully provide the effective latent explanation for feature attribution and ranking tasks. Our contributions can be summarized as follows: • CoRTX provides a contrastive framework for deriving latent explanation, which can effectively reduce the required amounts of explanation label; • Theoretical analysis indicate that CoRTX can effectively learn the latent explanation over the training set and strictly bound the explanation error; • Experimental results demonstrate that CoRTX can efficiently provide explanations to the target model, which is applicable to both tabular and image data.

2.1. NOTATIONS

We consider an arbitrary target model f (•) to interpret. Let input feature be x = [x 1 , • • • , x M ] ∈ X , where x i denote the value of feature i for 1 ≤ i ≤ M . The contribution of each feature to the model output can be treated as a cooperative game on the feature set X . Specifically, the preceding difference f ( x S∪{i} )-f ( x S ) indicates the contribution of feature i under feature subset S ⊆ U \ {i}, where U is the entire feature space. The overall contribution of feature i is formalized as the average preceding difference considering all possible feature subsets S, which can be formally given by ϕ i (x) := E S⊆U \{i} f ( x S∪{i} ) -f ( x S ) , where x S = S ⊙ x+(1-S) ⊙ x r denotes the perturbed sample, S = 1 S ∈ {0, 1} M is a masking vector of S, and x r = E[x | x ∼ P (x)] denotes the reference values * from feature distribution P (x). The computational complexity of Equation 1 grows exponentially with the feature number M , which cumbers its application to real-time scenarios. To this end, we propose an efficient explanation framework for real-time scenarios in this work.

2.2. REAL-TIME EXPLAINER

Different from the existing non-amortized explanation methods (Lundberg & Lee, 2017; Lomeli et al., 2019) that utilize local surrogate models for explaining data instances, RTX trains a global model to provide fast explanation via one feed-forward process. Compared with the existing methods, the advantages of RTX mainly lie in two folds: (1) faster explanation generation; and (2) more robust explanation derivation. Generally, existing RTXs attempt to learn the overall explanation distribution using two lines of methodologies, which are Shapley-sampling-based approaches (Wang et al., 2021; Jethani et al., 2021b; Covert et al., 2022) and feature-selection-based approaches (Chen et al., 2018;  



* Other statistic measurement can also be adopted for generating the reference value.

availability

https://github.

