EVALUATION OF SIMILARITY-BASED EXPLANATIONS

Abstract

Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to support model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations.

1. INTRODUCTION

Explaining the predictions made by complex machine learning models helps users understand and accept the predicted outputs with confidence (Ribeiro et al., 2016; Lundberg & Lee, 2017; Guidotti et al., 2018; Adadi & Berrada, 2018; Molnar, 2020) . Instance-based explanations are a popular type of explanation that achieve this goal by presenting one or several training instances that support the predictions of a model. Several types of instance-based explanations have been proposed, such as explaining with instances similar to the instance of interest (i.e., the test instance in question) (Charpiat et al., 2019; Barshan et al., 2020) ; harmful instances that degrade the performance of models (Koh & Liang, 2017; Khanna et al., 2019) ; counter-examples that contrast how a prediction can be changed (Wachter et al., 2018) ; and irregular instances (Kim et al., 2016) . Among these, we focus on the first one, the type of explanation that gives one or several training instances that are similar to the test instance in question and corresponding model predictions. We refer to this type of instance-based explanation as similarity-based explanation. A similarity-based explanation is of the form "I (the model) think this image is cat because similar images I saw in the past were also cat." This type of explanation is analogous to the way humans make decisions by referring to their prior experiences (Klein & Calderwood, 1988; Klein, 1989; Read & Cesa, 1991) . Hence, it tends to be easy to understand even to users with little expertise about machine learning. A report stated that with this type of explanation, users tend to have higher confidence in model predictions compared to explanations that presents contributing features (Cunningham et al., 2003) . In the instance-based explanation paradigm, including similarity-based explanation, a relevance metric R(z, z ) ∈ R is typically used to quantify the relationship between two instances, z = (x, y) and z = (x , y ). Definition 1 (Instance-based Explanation Using Relevance Metric). Let D = {z (i) train = (x (i) train , y (i) train )} N i=1 be a set of training instances and x test be a test input of interest whose predicted output is given by y test = f (x test ) with a predictive model f . An instance-based explanation method gives the most relevant training instance z ∈ D to the test instance z test = (x test , y test ) by z = arg max ztrain∈D R(z test , z train ) using a relevance metric R(z test , z train ). Table 1 : The relevance metrics and their evaluation results. For the model randomization test, the results that passed the test are colored. For the identical class test and identical subclass test, the results with the five highest average evaluation scores are colored. The details of the relevance metrics, the evaluation criteria, and the evaluation procedures can be found in Sections 1.2, 3, and 4, respectively. 1 , our experiments revealed that (i) the cosine similarity of gradients performs best, which is probably a recommended choice for similaritybased explanation in practice, and (ii) some relevance metrics demonstrated poor performances on the identical class and identical subclass tests, indicating that their use should be deprecated for similarity-based explanation. We also analyzed the reasons behind the success and failure of metrics. We expect these insights to help practitioners in selecting appropriate relevance metrics. 

1.2. RELEVANCE METRICS

We present an overview of the two types of relevance metrics considered in this study, namely similarity metrics and gradient-based metrics. To the best of our knowledge, all major relevance



Previously proposed relevance metrics include similarity(Caruana et al., 1999), kernel functions(Kim et al., 2016; Khanna et al., 2019), and influence function(Koh & Liang, 2017). Our implementation is available at https://github.com/k-hanawa/criteria_for_ instance_based_explanation



Notations For vectors a, b ∈ R p , we denote the dot product by a, b := p i=1 a i b i , the 2 norm by a := a, a , and the cosine similarity by cos(a, b) := a,b / a b . Classification Problem We consider a standard classification problem as the evaluation benchmark, which is the most actively explored application of instance-based explanations. The model is the conditional probability p(y | x; θ) with parameter θ. Let θ be a trained parameter θ = arg min θ L train := 1 ; θ), where the loss function is the cross entropy (z; θ) = -log p(y | x; θ) for an input-output pair z = (x, y). The model classifies a test input x test by assigning the class with the highest probability y test = arg max y p(y | x test ; θ).

ContributionsWe provide the first answer to the question about which relevance metrics have desirable properties for similarity-based explanation. For this purpose, we propose to use three minimal requirement tests to evaluate various relevance metrics in terms of their appropriateness. The first test is the model randomization test originally proposed byAdebayo et al. (2018)  for evaluating saliency-based methods, and the other two tests, the identical class test and identical subclass test, are newly designed in this study. As summarized in Table

