SHAPLEY EXPLAINABILITY ON THE DATA MANIFOLD

Abstract

Explainability in AI is crucial for model development, compliance with regulation, and providing operational nuance to predictions. The Shapley framework for explainability attributes a model's predictions to its input features in a mathematically principled and model-agnostic way. However, general implementations of Shapley explainability make an untenable assumption: that the model's features are uncorrelated. In this work, we demonstrate unambiguous drawbacks of this assumption and develop two solutions to Shapley explainability that respect the data manifold. One solution, based on generative modelling, provides flexible access to data imputations; the other directly learns the Shapley value-function, providing performance and stability at the cost of flexibility. While "off-manifold" Shapley values can (i) give rise to incorrect explanations, (ii) hide implicit model dependence on sensitive attributes, and (iii) lead to unintelligible explanations in higher-dimensional data, on-manifold explainability overcomes these problems.

1. INTRODUCTION

Explainability in AI is central to the practical impact of AI on society, thus making it critical to get right. While many dichotomies exist within the field -between local and global explanations (Ribeiro et al., 2016) , between post hoc and intrinsic interpretability (Rudin, 2019) , and between model-agnostic and model-specific methods (Shrikumar et al., 2017) -in this work we focus on local, post-hoc, model-agnostic explainability as it provides insight into individual model predictions, does not limit model expressiveness, and is comparable across model types. In this context, explainability can be treated as a problem of attribution. Shapley values (Shapley, 1953) provide the unique attribution method satisfying a set of intuitive axioms, e.g. they capture all interactions between features and sum to the model prediction. The Shapley approach to explainability has matured over the last two decades (Lipovetsky & Conklin, 2001; Kononenko et al., 2010; Štrumbelj & Kononenko, 2014; Datta et al., 2016; Lundberg & Lee, 2017) . Implementations of Shapley explainability suffer from a problem common across model-agnostic methods: they involve marginalisation over features, achieved by splicing data points together and evaluating the model on highly unrealistic inputs (e.g. Fig. 1 ). Such splicing would only be justified if all features were independent; otherwise, spliced data lies off the data manifold. Outside the Shapley paradigm, emerging explainability methods have begun to address this problem. See e.g. Anders et al. (2020) for a general treatment of the off-manifold problem in gradient-based explainability. See also Chang et al. (2019) and Agarwal et al. (2019) for image-specific explanations that respect the data distribution. Within Shapley explainability, initial work towards remedying the off-manifold problem has emerged; e.g. Aas et al. (2019) and Sundararajan & Najmi (2019) explore empirical and kernelbased estimation techniques, but these methods do not scale to complex data. A satisfactorily general and performant solution to computing Shapley values on the data manifold has yet to appear and is a focus of this work. Our main contributions are twofold: • Sec. 3 compares on-and off-manifold explainability, focusing on novel and unambiguous shortcomings of off-manifold Shapley values. In particular, we show that off-manifold explanations are often incorrect, and that they can hide implicit model dependence on sensitive features. Splice 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " y C B 0 E 2 t w 1 x 0 D k K x h w L q 9 j L E N 8 6 A = " > A A A B + n i c b V D L T g J B E J z 1 i f h a 9 O h l I j H x R H a J R A 8 e S L x 4 x C i P B A i Z H X p h w u w j M 7 0 q W f k U L x 4 0 x q t f 4 s 2 / c Y A 9 K F h J J 5 W q 7 n R  Y i j B O E k M 8 X + Y m k G N F p D r Q v F H C U Y 0 M Y V 8 L c S v m Q K c b R p J U 3 I b i L L y + T R r n k n p U q N + V i 9 T K L I 0 e O y D E 5 J S 4 5 J 1 V y T W q k T j h 5 I M / k l b x Z T 9 a L 9 W 5 9 z F t X r G z m k P y B 9 f k D K 0 2 T 6 w = = < / l a t e x i t > Splice 4 < l a t e x i t s h a 1 _ b a s e 6 4 = " g C V x N N w t q I 4 Z / J 3 3  P D 2 j 0 B R n g 4 Y = " > A A A B + n i c b V D L T g J B E J z 1 i f h a 9 O h l I j H x R H Y J R g 8 e S L x 4 x C i P B A i Z H X p h w u w j M 7 0 q W f k U L x 4 0 x q t f 4 s 2 / c Y A 9 K F h J J 5 W q 7 n R 3 e b E U G h 3 n 2 1 p Z X V v f 2 M x t 5 b d 3 d v f 2 7 c J B Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 R 1 d R v 3 o P S I g r v c B x D N 2 C D U P i C M z R S z y 5 0 E B 7 R 8 9 N b s 4 0 D r U x 6 d t E p O T P Q Z e J m p E g y 1 H r 2 V 6 c f 8 S S A E L l k W r d d J 8 Z u y h Q K L m G S 7 y Q a Y s Z H b A B t Q 0 M W g O 6 m s 9 M n 9 M Q o f e p H y l S I d K b + n k h Z o P U 4 8 E x n w H C o F 7 2 p + J / X T t C / 6 K Y i j B O E k M 8 X + Y m k G N F p D r Q v F H C U Y 0 M Y V 8 L c S v m Q K c b R p J U 3 I b i L L y + T R r n k V k p n N + V i 9 T K L I 0 e O y D E 5 J S 4 5 J 1 V y T W q k T j h 5 I M / k l b x Z T 9 a L 9 W 5 9 z F t X r G z m k P y B 9 f k D K c i T 6 g = = < / l a t e x i t > Splice 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " O U F a 7 1 1 O F 0 i 1 H 2 + O m Y S W / t s I D X Q = " > A A A B + n i c b V D L T g J B E J z 1 i f h a 9 O h l I j H x R H Z R o w c P J F 4 8 Y p R H A o T M D r 0 w Y f a R m V 6 V r H y K F w 8 a 4 9 U v 8 e b f O M A e F K y k k 0 p V d 7 q 7 v F g K j Y 7 z b S 0 t r 6 y u r e c 2 8 p t b 2 z u 7 d m G v r q N E c a j x S E a q 6 T E N U o R Q Q 4 E S m r E C F n g S G t 7 w a u I 3 7 k F p E Y V 3 O I q h E 7 B + K H z B G R q p a x f a C I / o + e m t 2 c a B n o y 7 d t E p O V P Q R e J m p E g y V L v 2 V 7 s X 8 S S A E L l k W r d c J 8 Z O y h Q K L m G c b y c a Y s a H r A 8 t Q 0 M W g O 6 k 0 9 P H 9 M g o P e p H y l S I d K r + n k h Z o P U o 8 E x n w H C g 5 7 2 J + J / X S t C / 6 K Q i j B O E k M 8 W + Y m k G N F J D r Q n F H C U I 0 M Y V 8 L c S v m A K c b R p J U d i f S E v J X v H h Q x K t / x J v / x k 2 b g 7 Y + G H i 8 N 8 P M P D 8 W X I P j f F s b m 1 v b O 7 u l v f L + w e H R s X 1 S 6 e g o U Z S 1 a S Q i 1 f O J Z o J L 1 g Y O g v V i x U j o C 9 b 1 p 8 3 c 7 z 4 x p X k k H 2 E e M y 8 k Y 8 k D T g k Y a W h X B s B m 4 A d p M y K C 5 1 o 2 t K t O z V k A r x O 3 I F V U o D W 0 v w a j i C Y h k 0 A F 0 b r v O j F 4 K V H A q W B Z e Z B o F h M 6 J W P W N 1 S S k G k v X d y e 4 Q u j j H A Q K V M S 8 E L 9 P Z G S U O t 5 6 J v O k M B E r 3 q 5 + J / X T y C 4 9 V I u 4 w S Y p M t F Q S I w R D g P A o + 4 Y h T E 3 B B C l X m d Y j o h i l A w c Z V N C O 7 q y + u k U 6 + 5 V 7 X r h 3 q 1 c V f E U U J n 6 B x d I h f d o A a 6 R y 3 U R h T N 0 D N 6 R W 9 W Z r 1 Y 7 9 b H s n X D K m Z O 0 R 9 Y n z + / E J T i < / l a t e x i t > Digit < l a t e x i t s h a 1 _ b a s e 6 4 = " X n j 3 • Sec. 4 develops two methods to compute on-manifold Shapley values on general data sets: (i) a flexible generative-modelling technique to learn the data's conditional distributions, and (ii) a simple supervised-learning technique that targets the Shapley value-function directly. We demonstrate the effectiveness of these methods on higher-dimensional data with experiments. + 0 o 3 k 9 g B C L 3 d h 2 r G n 0 Z 8 k O 4 = " > A A A B 9 X i c b V A 9 S w N B E N 3 z M 8 a v q K X N Y h C s w l 1 Q t L A I a G E Z w X x A o 0 R x q P J K R b v r M g B Q K a i h Q Q j P W w E J f Q s M f X k 3 8 x g N o I y J 1 h 6 M Y O i H r K x E I z t B K 9 2 2 E J / S D 9 F r 0 B Y 6 7 h a J b c q e g i 8 T L S J F k q H Y L X + 1 e x J M Q F H L J j G l

2. BACKGROUND ON SHAPLEY EXPLAINABILITY

The Shapley value (Shapley, 1953) is a method from cooperative game theory that distributes credit for the total value v(N ) earned by a team N = {1, 2, . . . , n} among its players: φ v (i) = S⊆N \{i} |S|! (n -|S| -1)! n! v(S ∪ {i}) -v(S) where the value function v(S) indicates the value that a coalition of players S would earn without their other teammates. The Shapley value φ v (i) represents player i's marginal value-added upon joining the team, averaged over all orderings in which the team can be constructed. In supervised learning, let f y (x) be a model's predicted probability that data point x belongs to class y.foot_0 To apply Shapley attribution to model explainability, one interprets the features {x 1 , . . . , x n } as players in a game and the output f y (x) as their earned value. To compute Shapley values, one must define a value function representing the model's output on a coalition x S ⊆ {x 1 , . . . , x n }. As the model is undefined on partial input x S , the standard implementation (Lundberg & Lee, 2017) samples out-of-coalition features, x S where S = N \ S, unconditionally from the data distribution: v (off) fy(x) (S) = E p(x ) f y (x S x S ) We refer to this value function, and the corresponding Shapley values, as lying off the data manifold since splices x S x S generically lie far from the data distribution. Alternatively, conditioning outof-coalition features x S on in-coalition features x S would result in an on-manifold value function: v (on) fy(x) (S) = E p(x |x S ) f y (x ) The conditional distribution p(x |x S ) is not empirically accessible in practical scenarios with highdimensional data or many-valued (e.g. continuous) features. A performant method to compute onmanifold Shapley values on general data is until-now lacking and a focus of this work.  One interprets the global Shapley value Φ f (i) as the portion of model accuracy attributable to the i th feature. Indeed, the first term in Eq. ( 5) is the accuracy one achieves by sampling labels from f 's predicted probability distribution over classes. The offset term, which relates to class balance, is not attributable to any individual feature.



The results of this paper can be applied to regression problems by reinterpreting fy(x) as the model's predicted value rather than its predicted probability.



3 e b E U G h 3 n 2 1 p Z X V v f 2 M x t 5 b d 3 d v f 2 7 c J B Q 0 e J 4 l D n k Y x U y 2 M a p A i h j g I l t G I F L P A k N L 3 R 1 d R v 3 o P S I g r v c B x D N 2 C D U P i C M z R S z y 5 0 E B 7 R 8 9 N b s 4 0 D r U x 6 d t E p O T P Q Z e J m p E g y 1 H r 2 V 6 c f 8 S S A E L l k W r d d J 8 Z u y h Q K L m G S 7 y Q a Y s Z H b A B t Q 0 M W g O 6 m s 9 M n 9 M Q o f e p H y l S I d K b + n k h Z o P U 4 8 E x n w H C o F 7 2 p + J / X T t C / 6 K

3 I b j z L y + S e r n k n p b O b s r F y m U W R 4 4 c k E N y T F x y T i r k m l R J j X D y Q J 7 J K 3 m z n q w X 6 9 3 6 m L U u W d n M P v k D 6 / M H K E O T 6 Q = = < / l a t e x i t > Splice 2 < l a t e x i t s h a 1 _ b a s e 6 4 = "Q o o K 9 l 0 e 7 g U v j Z H G / o t F 9 Y r E N d s = " > A A A B + n i c b V D L T g J B E J z F F + J r 0 a O X i c T E E 9 k l G j 1 4 I P H i E a M 8 E i B k d u i F C b O P z P S q Z O V T v H j Q G K 9 + i T f / x g H 2 o G A l n V S q u t P d 5 c V S a H S c b y u 3 s r q 2 v p H f L G x t 7 + z u 2 c X 9 h o 4 S x a H O I x m p l s c 0 S B F C H Q V K a M U K W O B J a H q j q 6 n f v A e l R R T e 4 T i G b s A G o f A F Z 2 i k n l 3 s I D y i 5 6 e 3 Z h s H W p n 0 7 J J T d m a g y 8 T N S I l k q P X s r 0 4 / 4 k k A I X L J t G 6 7 T o z d l C k U X M K k 0 E k 0 x I y P 2 A D a h o Y s A N 1 N Z 6 d P 6 L F R + t S P l K k Q 6 U z 9 P Z G y Q O t x 4 J n O g O F Q L 3 p T 8 T + v n a B / 0 U 1 F G C c I I Z 8 v 8 h N J M a L T H G h f K O A o x 4 Y w r o S 5 l f I h U 4 y j S a t g Q n A X X 1 4 m j U r Z P S 2 f 3 V R K 1 c s s j j w 5 J E f k h L j k n F T J N a m R O u H k g T y T V / J m P V k v 1 r v 1 M W / N W d n M A f k D 6 / M H J r 6 T 6 A = = < / l a t e x i t > Splice 1< l a t e x i t s h a 1 _ b a s e 6 4 = " j M y g l L c BI i K n M T v a B h D A 6 H A 7 v K 4 = " > A A A B + n i c b V D L T g J B E J z F F + J r 0 a O X i c T E E 9 k l G j 1 4 I P H i E a M 8 E i B k d u i F C b O P z P S q Z O V T v H j Q G K 9 + i T f / x g H 2 o G A l n V S q u t P d 5 c V S a H S c b y u 3 s r q 2 v p H f L G x t 7 + z u 2 c X 9 h o 4 S x a H O I x m p l s c 0 S B F C H Q V K a M U K W O B J a H q j q 6 n f v A e l R R T e 4 T i G b s A G o f A F Z 2 i k n l 3 s I D y i 5 6 e 3 Z h s H 6 k 5 6 d s k p O z P Q Z e J m p E Q y 1 H r 2 V 6 c f 8 S S A E L l k W r d d J 8 Z u y h Q K L m F S 6 C Q a Y s Z H b A B t Q 0 M W g O 6 m s 9 M n 9 N g o f e p H y l S I d K b + n k h Z o P U 4 8 E x n w H C o F 7 2 p + J / X T t C / 6 K Y i j B O E k M 8 X + Y m k G N F p D r Q v F H C U Y 0 M Y V 8 L c S v m Q K c b R p F U w I b i L L y + T R q X sn p b P b i q l 6 m U W R 5 4 c k i N y Q l x y T q r k m t R I n X D y Q J 7 J K 3 m z n q w X 6 9 3 6 m L f m r G z m g P y B 9 f k D J T m T 5 w = = < / l a t e x i t > Coalition < l a t e x i t s h a 1 _ b a s e 6 4 = " c 6 7 d D k Q f 5 k N n t g 3 p S Z Y l u P U I w / I = " > A A A B + 3 i c b V B N S 8 N A E N 3 4 W e t X r E c v i 0 X w V J K i 6 M F D o R e P F e w H t K F s t p t 2 6 W Y T

E s P e Z i 5 Z s r d 3 7 M 6 p 4 c j / s L F Q x N b / Y u e / c Z N c o Y k P B h 7 v z T A z z 4 + l M O i 6 3 8 7 S 8 s r q 2 n p u I 7 + 5 t b 2 z W 9 j b r 5 s

Figure 1: An MNIST digit, a coalition of pixels in a Shapley calculation, and 5 off-manifold splices.

Shapley values φ fy(x) (i) provide local explainability for the model's prediction on data point x. To understand the model's global behaviour, one aggregates the φ fy(x) (i)'s into global Shapley values:Φ f (i) = E p(x,y) φ fy(x) (i)(4) where p(x, y) is the labelled-data distribution. Global Shapley values can be seen as a special case of the global explanation framework introduced by Covert et al. (2020). As a consequence of the axioms (Shapley, 1953) satisfied by the φ fy(x) (i)'s, global Shapley values satisfy a sum rule: i∈N Φ f (i) = E p(x,y) f y (x) -E p(x ) E p(y) f y (x )

