IMPROVING EXPLANATION RELIABILITY THROUGH GROUP ATTRIBUTION

Abstract

Although input attribution methods are mainstream in understanding predictions of DNNs for straightforward interpretations, the non-linearity of DNNs often makes the attributed scores unreliable in explaining a given prediction, deteriorating the faithfulness of the explanation. However, the challenge could be mitigated by explaining groups of explanatory components rather than the individuals, as interaction among the components can be reduced through appropriate grouping. Nevertheless, a group attribution does not explain the component-wise contributions so that its component-interpreted attribution becomes less reliable than the original component attribution, indicating the trade-off of dual reliabilities. In this work, we first introduce the generalized definition of reliability loss and group attribution to formulate the optimization problem of the reliability tradeoff. Then we specify our formalization to Shapley value attribution and propose the optimization method G-SHAP. Finally, we show the explanatory benefits of our method through experiments on image classification tasks.

1. INTRODUCTION

The advance in deep neural networks facilitates a training model to learn high-level semantic features in a variety of fields, but intrinsic difficulties in explaining predictions of DNNs become a primary barrier to real-world applications, especially for domains requiring trustful reasoning for model predictions. While various approaches have been proposed to tackle the challenge, which includes deriving global behavior or knowledge of a trained model (Kim et al., 2018) , explaining the semantics of a target neuron in a model, (Ghorbani et al., 2019; Simonyan et al., 2013; Szegedy et al., 2015) , introducing self-interpretable models (Zhang et al., 2018; Dosovitskiy et al., 2020; Touvron et al., 2020; Arik & Pfister, 2019) , input-attribution methods became the mainstream of post-hoc explanation methods since they explain a model prediction by assigning a scalar score to each explanatory component (feature) of its input data, yielding the straightforward explanation for end-users through data-corresponded visualization such as a heatmap. However, since each explanatory component is explained with a single scalar score, the nonlinearity in DNNs makes their scores less reliable in explaining a model's prediction. It results in the discrepancy between the explained and actual model behavior for a prediction, deteriorating the faithfulness of the explanation. As it is the inherent challenge of input attribution methods, the problem has been studied and tackled with various approaches and perspectives: (Grabisch & Roubens, 1999) formalizes the axiomatic interactions for cooperative games, (Tsang et al., 2018) explains the statistical interaction between input features from learned weights in DNN, (Kumar et al., 2021) introduces Shapley Residuals to quantify the unexplained contribution of Shapley values, (Janizek et al., 2021) extends Integrated Gradients (Sundararajan et al., 2017) to Integrated Hessians to explain the interaction between input features. While these approaches have improved the explainability to the DNN's nonlinearity, their explaining scores are not corresponded to each explanatory components in many cases, reducing the interpretability of explanations. Figure 1 : Trade-off of the dual reliability loss of a group attribution for a simple non-linear function. Grouping x 1 , x 2 resolves their interaction so that it reduces the reliability loss of the group {x 1 , x 2 } but increases those of component-interpreted scores. Here attribution score and its reliability loss are defined as the input gradient and expected L2 error of its tangent approximation, which are ∂ ∂xi f (x) and E t∼N (0,1) [(f (x + te i ) -f (x) -tφ i ) 2 ], respectively. Instead, it can be alleviated by explaining a model's prediction in terms of groups explanatory components rather than the individuals, termed group attribution. Appropriate grouping can weaken the interaction among the components, yielding more reliable explanation. However, a group attribution does not attribute scores to the individual components so that interpreting a group attribution in terms of the individual components results in less reliable explanation than the original component attribution. Therefore, both group-wise and component-interpreted attribution reliability should be considered for deriving a group attribution, implying a trade-off optimization problem. Figure 1 illustrates this problem with simple a non-linear function. In this paper, we present our work as follows: In Section 2, we introduce the generalized definition of reliability loss and group attribution to formulate the optimization problem of the reliability tradeoff. In section 3, we integrate our formalization with Shapley value attribution (Lundberg & Lee, 2017) and propose the grouping algorithm G-SHAP. We choose the Shapley value as our scoring policy for two reasons: 1) it has been utilized as a popular attribution method for its model-agnostic characteristic and well-founded axiomatic properties. 2) it becomes less reliable when there are strong interactions among the explanatory component's contribution, as it take the aggregation of the contributions over all coalition states. In section 4, we show the explanatory benefits of our method through experiments on image classification tasks as follows: 1) we verify the grouping effect of G-SHAP through quantitative and visual analysis. 2) we validate our grouping approach by comparing it with several baseline grouping method which would yield the similar grouping result to ours. 3) we show the improvement in local explainability of a prediction through the estimation game, which utilizes the deletion game (Petsiuk et al., 2018; Wagner et al., 2019) to measure the error of model output changes. Our contributions are summarized as follows: 1. We introduce two novel concepts to improve the limited reliability of input attribution methods: reliability loss that quantifies the discrepancy between the explained and the actual model behavior for a prediction, group attribution that explains a prediction in terms of groups of explanatory components. Since a group attribution becomes less reliable in explaining component-wise contributions, we formulate the optimization problem to resolve the reliability trade-off. While we choose the Shapley value as our scoring policy, our formulation consists of generalized terms, applicable for other input attribution methods. 2. We propose G-SHAP, a grouping algorithm for Shapley value attribution. We empirically show that G-SHAP has better local explainability of a model prediction than SHAP. We also validate the effectiveness of our grouping approach by comparing it with several baseline grouping methods, which would yield the similar grouping results to ours.

2.1. RELIABILITY LOSS OF AN ATTRIBUTION

Let y * = f (x * ) be a model prediction to explain, where x * = (x * 1 , ..., x * N ) ∈ R N and f : R N → R are the input data and the model function, respectively. Let Φ be attributing (scoring) function that takes a model function f and a target input x * and returns the attribution scores φ ∈ R N . As we consider f and x * is fixed, we introduce the translated model function f * (x) = f (x + x * ) to simplify our henceforth definitions. Since explaining f (x) at x = x * is equivalent to explaining f * (x) at x = 0, we have φ = (φ 1 , ..., φ N ) = Φ(f, x * ) = Φ(f * , 0) ∈ R N (1) Similarly, let Ξ be a function that quantifies the reliability loss in explaining the prediction with an arbitrary attribution a = (a 1 , ..., a N ) ∈ R N as below. ξ(a) = Ξ(f * , a) ≥ 0 (2) where the lower value implies the more reliable attribution in explaining the prediction. We note that a = (a 1 , ..., a N ) can be arbitrary, not necessarily φ.

2.2. GROUP ATTRIBUTION AND ITS RELIABILITY LOSS

A group attribution attributes a score to each group of explanatory components, where the components within each group are treated as one shared variable. Formally, let G = {G 1 , ..., G M } be a grouping (partition) of the component set X = {x 1 , ..., x N }. Then the group-mapped function f * G assigns each group-variable g i to its corresponding components variables of X, defined as f * G (g 1 , ..., g M ) = f * (g σ(1) , ..., g σ(N ) ) (3) where σ is the group map such that x i ∈ G σ(i) for each 1 ≤ i ≤ N . For example, the group-mapped function of of f (a, b, c) with the grouping G = {{a, b}, {c}} is f G (g 1 , g 2 ) = f (g 1 , g 1 , g 2 ). Once we have a grouping G, its group attribution φ G is defined as the attribution scores of f * G , which is φ G = (φ G1 , ..., φ G M ) = Φ(f * G , 0) ∈ R M 4) By definition, each group score φ Gj indicates the co-contribution of their components x i ∈ G j . Note that it is not necessarily equal to the sum of the component scores in general. Similarly, we can derive the reliability loss of a group attribution φ G as below, ξ(φ G ) = Ξ(f * G , φ G ) (5) which tells that how reliable in explaining the prediction with the group attribution. From the definition, we can say a group attribution φ G is more reliable than the component attribution φ if ξ(φ G ) < ξ(φ) and less reliable if ξ(φ G ) > ξ(φ).

2.3. COMPONENT-INTERPRETATION OF A GROUP ATTRIBUTION AND ITS RELIABILITY LOSS

As discussed in the introduction, a group attribution does not attribute scores to their belonging components so that its component-interpreted scores would have larger reliability loss than the original component attribution. To address this, we need to first formalize the score-interpreting function ζ, which interprets a group attribution φ G in terms of the individual components, denoted with the tilde as below. φ G = ( φ 1 , ..., φ N ) = ζ(φ G ) ∈ R N (6) It is notable that the interpreting policy can vary depending on the component or score's semantics but must not utilize any information of the prediction. For example, defining ζ(φ G ) = φ is not acceptable. Consequently, the component-wise reliability loss of a group attribution φ G is given as below. ξ( φ G ) = Ξ(f * , φ G ) (7)

2.4. FORMULATING THE OPTIMIZATION PROBLEM OF THE RELIABILITY TRADE-OFF

In order to formalize the trade-off of dual reliability of a group attribution, we first normalize the improvement and deterioration of the reliability losses. As the reliability loss of a group attribution ξ(φ G ) is expected to be lower than the that of the original component attribution ξ(φ), we consider ξ(φ) as the baseline and define the normalized score for a group attribution's reliability (NGR) G as the ratio of the improved amount to the baseline, given as G(G) = ξ(φ) -ξ(φ G ) ξ(φ) It follows that higher G is better: it becomes 1 if ξ(φ G ) = 0 (maximum improvement) and 0 if ξ(φ G ) = ξ(φ) (no improvement). It can be negative if the grouping is ill-chosen. On the other hand, the component-interpreted reliability loss of a group attribution ξ( φ G ) is expected to be higher than the original ξ(φ). Since the most uninformative grouping G all that merges all components into one group i.e., G all = {G 1 } = {{z 1 , ..., z N }} is expected to have the largest component-interpreted reliability loss, we define the normalized score for a group attribution's component-interpreted reliability (NCR) C as the ratio of the less-deteriorated (saved) amount to G all to the gap, given as C(G) = ξ( φ Gall ) -ξ( φ G ) ξ( φ Gall ) -ξ(φ) It also follows that higher C is better: it becomes 1 if ξ( φ G ) = ξ(φ) (no deterioration), 0 if ξ( φ G ) = ξ( φ Gall ) (deteriorated as G all ). Since there are two singular cases that should be avoided for searching the grouping, which are singleton grouping (no-grouping) and the all-grouping G all . As their (G, C) scores are (0, 1) and (1, 0), respectively, we define the optimization objective L as the geometric mean of two scores, given as L(G) = max G(G) + 1 + , 0 1 2 -β max C(G) + 1 + , 0 1 2 +β (10) where ≥ 0 is the tolerance hyperparameter for dealing with negative G and C values and β ∈ [-1/2, 1/2] is the balancing hyperparameter that positive β weighs more to C than G and vice versa. It follows that larger L implies better group attribution.

3.1. SHAPLEY VALUE AND ITS RELIABILITY LOSS

Shapley value has originated from cooperative game theory, indicating the fair division of given reward to each player. It has been utilized as the axiomatic attribution method for post-hoc model explanations (Lundberg & Lee, 2017) , where the players and the reward are corresponded to the binary explanatory components and the output difference of the model prediction, respectively. Formally, let Z = {z 1 , ..., z N } be the set of binary explanatory components and Z = {0, 1} N be the set of the all possible coalition states z = (z 1 , ..., z N ), where each z i = 1, 0 indicates whether z i is involved in the coalition or not, respectively. Once a model function f : Z → R is given, contribution of z i at a coalition state z ∈ Z is defined as h i (z) = f (z i=1 ) -f (z) where the notation z i=1 means z is assigned with z i = 1. Since h i (z) is trivially zero when z i = 1, we restrict the domain of h i to Z i=0 := {z ∈ Z|z i = 0}. Then Shapley value of z i is given as the weighted sum of contributions h i at all possible coalition states, which is φ i = z∈Zi=0 w N (|z|)h i (z), w N (k) = k!(N -k -1)! N ! ( ) where |z| is termed coalition size, the number of 1s in z. Since it follows that z∈Zi=0 w N (|z|) = 1, a Shapley value φ i can be considered as the expected value of h i by regarding the weights w N (|z|) as the probability, i.e., φ i = E Zi=0 [h i ]. This perspective of defining Shapley values naturally leads to measure the expected L2 error of the contributions, given as ξ 2 i (a i ) = E Zi=0 [(h i -a i ) 2 ] = z∈Zi=0 w N (|z|)(h i (z) -a i ) 2 which is named Shapley error of attributing z i with a score a i . Now we define the reliability loss of Shapley attribution as ξ 2 (a) = N i=1 ξ 2 i (a i ) where φ = (φ 1 , ..., φ N ) and a = (a 1 , ..., a N ). Similar to the property of ordinary mean and variance, it follows that ξ 2 i (a i ) = ξ 2 i (φ i ) + (a i -φ i ) 2 so that the reliability loss consequently satisfies ξ 2 (a) = ξ 2 (φ) + a -φ 2 2 (15) which implies that the reliability loss is minimized to ξ 2 (φ) when a = φ, showing the optimality of Shapley value attribution.

3.2. SHAPLEY GROUP ATTRIBUTION

Let G = {G 1 , G 2 , ..., G M } be a partition (grouping) of the component set Z = {z 1 , ..., z N } with non-empty groups. Then a group-wise coalition state z ∈ Z under the grouping G is restricted to the cases that components in each group G j are all involved or not, denoted as z[G j ] = 1, z[G j ] = 0, respectively. Since each group G j has a binary involvement state, the coalition state has M degree of freedom and can be represented as a M -dimensional binary vector. First, contribution of G j at a coalition state z ∈ Z is defined as the output difference of f by switching all z i ∈ G j to 1, given as h Gj (z) = f (z Gj =1 ) -f (z) where the notation z Gj =1 means that z is assigned with z i = 1 for all z i ∈ G j . Similar to h i , we consider the domain of h Gj as Z Gj =0 , defined as z ∈ Z satisfying z[G j ] = 0 and z[G m ] ∈ {0, 1} for all 1 ≤ m = j ≤ M . Consequently, the Shapley value and error of a group G j is defined as φ Gj = E Z G j =0 [h Gj ], ξ 2 Gj (a Gj ) = E Z G j =0 [(h Gj -a Gj ) 2 ] ( ) It is notable that the expectation operation E Z G j =0 is not compatible with the component-wise case E Zi=0 since the dimension of coalition states chagned from N to M , which implies that Shapley value of a group does not equal to the sum of its components' Shapley values.

3.3. G-SHAP: ALGORITHM FOR SHAPLEY GROUP ATTRIBUTION

Since the number of groups M can vary from [1, N ], it is difficult to evaluate the group-wise Shapley terms from component-wise terms. Moreover, group-wise coalition states depend on not only the number of groups but also the grouping itself. It implies that the optimization problem is more challenging than the set-partitioning problem, where the target value of each subset is fixed. However, as (Guanchu, 2022) has shown the effectiveness, Shapley statistics can be approximated by excluding the components or groups which have little effect on the target. Despite the incompatibility of Shapley weights, the weight decomposition property w suggests that φ i and ξ 2 i can be decomposed with z j -conditioned contributions as N -1 (k) = w N (k) + w N (k + 1) φ i = N -2 l=0 |z|=l w N (l)h i (z j=0 ) + w N (l + 1)h i (z j=1 ) ξ 2 i = N -2 l=0 |z|=l w N (l)(h i (z j=0 ) -φ i ) 2 + w N (l + 1)(h i (z j=1 ) -φ i ) 2 where z j = z i can be chosen arbitrary. It implies that if h i (z j=0 ), h i (z j=1 ) ≈ h i (z) then φ i|j=0 , φ i|j=1 ≈ φ i and ξ i|j=0 , ξ i|j=1 ≈ ξ i so that the appropriate exclusion would yield more accurate approximation for Shapley statistics. In our method G-SHAP, we take i = j =i (φ i|j=0 -φ i ) 2 + (φ i|j=1 -φ i ) 2 the heuristic for the exclusion. Components or groups with top-k i values are considered as the core subset K of given grouping G. Once we have K then we observe all binary states {0, 1} k of K to derive the optimal grouping of K, where the excluded components or groups are fixed. Then we apply the optimal grouping and continue the progress until |G| < k. Overall progress is illustrated in the Figure 2 .

4. EXPERIMENTAL RESULTS

While there have been existing group or cluster-wise explanation methods (Masoomi et al., 2020; Singh et al., 2018) their grouping criteria and objective are different from ours so that comparing those explanations with ours would not be resonable. Therefore, as mentioned in the introduction, we have focused on verifying the explanatory benefits of our group attribution (G-SHAP) through comparison with the corresponding component attribution (SHAP).

4.1. EXPERIMENTAL SETUP

As mentioned in the introduction, we have applied the proposed method on the validation datasets of Flower5(multi-class) (Mamaev, 2018) , MS COCO 2014(multi-label) (Lin et al., 2014) , and Pascal VOC 2012(multi-label) (Everingham et al., 2012 ) with ImageNet 2012Russakovsky et al. (2015) pretrained ResNet-50 (He et al., 2016) model, where Flower5 stands for subset of Flower dataset with 5 distinctive classes (daisy, dandelion, rose, sunflower, tulip). We have fine-tuned the model to each datasets as the average prediction accuracy is 94.7%, 82.9%, and 90.3%, respectively. For MS COCO and Pascal VOC models, we have taken logit value of top-1 label as the output. Since our heuristic of choosing the core subset requires conditional Shapley values which have O(N 2 ) terms, we have considered superpixels of a image as explanatory components. We have experimented with two superpixel methods, quick-shift (Vedaldi & Soatto, 2008) and graph-based (Felzenszwalb & Huttenlocher, 2004 ) segmentation method, which existing attribution methods LIME (Ribeiro et al., 2016) and XRAI (Kapishnikov et al., 2019) use. As Shapley value considers binary input states, we have defined the input map as mean-color masking function so that z i = 0, 1 corresponds to the mean-colored and the original superpixel, respectively. We also define the score estimating function ζ as distributing a group score according to pixel area, i.e., [ξ(φ G )] j = wj z j ∈G wj φ G for z j ∈ G. We set core subset dimension k = 10, β = 0.0, and = 0.1 as default.

4.2. OPTIMIZATION EFFECTS OF G-SHAP

We have observed NGR, and NCR to verify the improved and saved Shapley error of G-SHAP attribution for β = -0.25, 0.00, 0.25, stated at Table 1 (left). As NGR, NCR indicates normalized ratio of the amounts, it tells that 75% ∼ 91% of the baseline Ξ(φ) are resolved through grouping while 62% ∼ 68% of the bound gap Ξ( φ Gall ) -Ξ(φ) is saved for β = 0.00 case. It has also been observed β considerably affect the NGR and NCR scores such that positive β weighs NCR much than NGR, whereas the negative β weighs NCR much than NGR, agreeing with our expectation. Figure 3 shows that the balancing effect can also be verified qualitatively, telling that higher β results in higher NCR that the heatmap of G-SHAP is closer to the component attribution (SHAP) but lower NGR that Shapley errors are less improved. On the other hand, lower β results in the opposite as well Table 1 : Reliability scores of the G-SHAP for β = 0.25, 0.00 and -0.25 (left) and comparison with baseline heuristics, where scores are averaged on the datasets (right) but also tells that G-SHAP attribution consists of a few groups with salient superpixels. It implies that our method needs to be compare with baseline grouping methods to validate our approach as the sanity-check, discussed in the later subsection.

4.3. VALIDATING THE GROUPING APPROACH

In order to show the validity of our grouping strategy, we have compared G-SHAP with various grouping heuristics which would likely yield the similar results, stated in the Table 1 (right) and illustrated in the Figure 4 . The details for each grouping heuristics are described below. First, we have employed 2-grouping and 3-grouping methods as G-SHAP with lower β yields few groups. The 2-grouping method sorts the components by Shapley values and splits them into two groups by merging the top 1 ≤ k ≤ N components and the others, and returns the best one. Similarly, the 3-grouping method considers all grouping cases with of top-k, bottom-m, and the middle-(N -k -m) components and picks the best one. While the results show that their NGR and NCR are slightly lower than ours (less than 0.1), their explanation contains too minimal information since the most intermediate-salient superpixels are neglected. We have also employed K-means grouping and Adjacency-greedy grouping methods to test the grouping performance of merging components with closer attribution scores, as the salient superpixels of G-SHAP attribution are often attributed with higher or closer Shapley values. The K-means grouping method clusters of the components into 2 ≤ k ≤ 10 groups and returns the best grouping, where the distance metric is given as the difference of normalized Shapley value (divided by superpixel area). The Adjacency-greedy method iteratively merges two groups with the closest normalized Shapley values, and returns the best grouping. As these strategies are expected to retain higher NCR scores, both methods show that NCR is simlar or slightly higher than ours, whereas NGR is clearly lower than ours. It implies that these two heuristics could not resolve group-wise Shapley errors as any interaction statistics are utilized. In addition, we have employed G-SHAP without core subset searching as the ablation study, named G-SHAP greedy, which instead greedily merges two groups which are expected to improve the L the most. As it shows that both NGR and NCR are around 0.1 lower than our method in average, implying that the optimization problem is challenging to solve with simple greedy approach so that optimal partition searching in the core subset is necessary. et al., 2018; Wagner et al., 2019) is the main strategy to assess the attribution scores, which removes each component of input data in sequence and evaluates the model output drop through AUC of the curve. However, these methods usually rely on the ranking of the attribution scores so that it does not assess the reliability of the attribution in general. Therefore, we have employed this idea in a different way, termed estimation game, which aims to measure the error of expected output changes to the actual one. As Shapley value indicates the expected model contribution, this assessment approach is intuitive to understand has also been utilized in (Guanchu, 2022) . For the deletion process, we have employed three types of deletions: min-deletion, max-deletion, and random-deletion, which deletes (fills mean-color) inputs in increasing, decreasing, and random order of attribution score, respectively. Since model logits can be arbitrarily scaled depending on the prediction, we have normalized as follows:

4.4. ESTIMATION GAMES

(1) we have linearly rescaled y-axis such that y = 0, 1 stands for ground image (meancolored image) and target image, respectively. (2) we have also linearly rescaled x-axis as it indicates the ratio of removed pixels to entire pixels. Therefore removal game always starts from (0, 1) and ends with (1, 0), illustrated in the Figure 5 . As shown in Table 2 , G-SHAP resolves around 60% to 70% of the L2 estimation error of component attribution (SHAP), providing a better understanding of the local behavior of the model.

5. CONCLUSION

Though input-attribution methods provide clear interpretation as the explanations correspond to the data, non-linearity of deep models intrinsically hinders reliability of attribution. In this work, we have presented novel perspective of quantifying the reliability, attributing groups, and formulating it with the optimization problem. We have chosen Shapley value for the scoring policy to specify the terms and propose the grouping algorithm G-SHAP. We have shown the explanatory benefits of our group attribution in multiple perspectives. Its improvement of a group attribution's reliability loss is clearly larger than deterioration of component-interpreted reliability loss, and also improved local explainability of a model's prediction. However, since our method utilizes Shapley conditional terms and search partition spaces with iteration, its computation cost is too high to start with pixel-wise components. Deeper analytical approach and utilizing the prior information of input components would improve the performance and feasibility, left as potential for future works.



Figure 2: Illustration of the G-SHAP algorithm with corresponding NGR-NCR graph: (a) initial stage: grouping starts with the singleton grouping (equal to component-attribution), where (G, C) = (0, 1), (b) an arbitrary middle step, (c) the very next step: the optimal grouping of the core subset K is applied (d) last step: remaining number of group is less than the core set size. Dotted curve indicates contour line of the objective L and star mark indicates the best Shapley group attribution which G-SHAP finally returns.

Figure 3: G-SHAP results for the image classification task, taken from MS COCO, Flower5, Pascal VOC dataset, where superpixels are chosen as graph-based, graph-based, and quick-shift, for each image, respectively. The 3-5th columns stand for β = 0.25, 0.00 and -0.25, respectively. For each image, heatmap of the upper row indicates the attribution score and the lower row indicates the attribution reliability. Heatmaps are area-normalized ratio to their base values, which are their sum divided by entire area of the image.

Figure 4: Comparison results of G-SHAP with various heuristic methods, where the images are taken from COCO, Flower5, and Pascal VOC dataset, and the superpixels are chosen from quickshift, quick-shift, and graph-based method, respectively. Superpixels Datasets β = 0.25 β = 0.00 β = -0.25 NGR NCR NGR NCR NGR NCR Quick-shift COCO 0.596 0.832 0.799 0.679 0.939 0.667 Flower5 0.593 0.831 0.758 0.642 0.919 0.616 VOC 0.602 0.829 0.824 0.670 0.942 0.680

Figure 5: Illustration of the estimation game, measuring the error of expected drop to the actual drop Deletion game(Petsiuk et al., 2018;Wagner et al., 2019) is the main strategy to assess the attribution scores, which removes each component of input data in sequence and evaluates the model output drop through AUC of the curve. However, these methods usually rely on the ranking of the attribution scores so that it does not assess the reliability of the attribution in general. Therefore, we have employed this idea in a different way, termed estimation game, which aims to measure the error of expected output changes to the actual one. As Shapley value indicates the expected model contribution, this assessment approach is intuitive to understand has also been utilized in(Guanchu, 2022). For the deletion process, we have employed three types of deletions: min-deletion, max-deletion, and random-deletion, which deletes (fills mean-color) inputs in increasing, decreasing, and random order of attribution score, respectively. Since model logits can be arbitrarily scaled depending on the prediction, we have normalized as follows: (1) we have linearly rescaled y-axis such that y = 0, 1 stands for ground image (meancolored image) and target image, respectively. (2) we have also linearly rescaled x-axis as it indicates the ratio of removed pixels to entire pixels. Therefore removal game always starts from (0, 1) and ends with (1, 0), illustrated in the Figure5. As shown in Table2, G-SHAP resolves around 60% to 70% of the L2 estimation error of component attribution (SHAP), providing a better understanding of the local behavior of the model.

Estimation game result of component attribution (SHAP) and group attribution (G-SHAP)

