HUMAN-INTERPRETABLE MODEL EXPLAINABILITY ON HIGH-DIMENSIONAL DATA

Abstract

The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithms lack natural interpretations for general users. In this work, we introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically-meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. These latent features can be learnt, e.g. explicitly as disentangled representations or implicitly through image-to-image translation, or they can be based on any computable quantities the user chooses. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoreticallycontrolled and computationally-tractable. We benchmark our approach on synthetic data and demonstrate its effectiveness on several image-classification tasks.

1. INTRODUCTION

The explainability of AI systems is important, both for model development and model assurance. This importance continues to rise as AI models -and the data on which they are trained -become ever more complex. Moreover, methods for AI explainability must be adapted to maintain the human-interpretability of explanations in the regime of highly complex data. Many explainability methods exist in the literature. Model-specific techniques refer to the internal structure of a model in formulating explanations (Chen & Guestrin, 2016; Shrikumar et al., 2017) , while model-agnostic methods are based solely on input-output relationships and treat the model as a black-box (Breiman, 2001; Ribeiro et al., 2016) . Model-agnostic methods offer wide applicability and, importantly, fix a common language for explanations across different model types. The Shapley framework for model-agnostic explainability stands out, due to its theoretically principled foundation and incorporation of interaction effects between the data's features (Shapley, 1953; Lundberg & Lee, 2017) . The Shapley framework has been used for explainability in machine learning for years (Lipovetsky & Conklin, 2001; Kononenko et al., 2010; Štrumbelj & Kononenko, 2014; Datta et al., 2016) . Unfortunately, the combinatorics required to capture interaction effects make Shapley values computationally intensive and thus ill-suited for high-dimensional data. More computationally-efficient methods have been developed to explain model predictions on highdimensional data. Gradient-and perturbation-based methods measure a model prediction's sensitivity to each of its raw input features (Selvaraju et al., 2020; Zhou et al., 2016; Zintgraf et al., 2017) . Other methods estimate the mutual information between input features and the model's prediction (Chen et al., 2018a; Schulz et al., 2020) , or generate counterfactual feature values that change the model's prediction (Chang et al., 2019; Goyal et al., 2019; Wang & Vasconcelos, 2020) . See Fig. 1 for explanations produced by several of these methods (with details given in Sec. 3.5). When intricately understood by the practitioner, these methods for model explainability can be useful, e.g. for model development. However, many alternative methods exist to achieve broadly the same goal (i.e. to monitor how outputs change as inputs vary) with alternative design choices that  v l H f 3 m i b J N M M G S 0 S i 2 x E 1 K L j C h u V W Y D v V S G U k s B W N r n K / d Y / a 8 E T d 2 n G K o a Q D x W P O q H X S X V d S O 4 x i E v f K l a D q z 0 D + J h V Y o N 4 r v 3 f 7 C c s k K s s E N a Y T + K k N J 1 R b z g R O S 9 3 M Y E r Z i A 6 w 4 6 i i E k 0 4 m Q W e k i O n 9 E m c a P e U J T P 1 6 8 a E S m P G M n K T e U D z 0 8 v F 3 7 x O Z u P z c M J V m l l U b H 4 o z g S x C c l / T / p c I 7 N i 7 A h l m r u s h A 2 p p s y 6 j k r / K 6 F 5 U g 1 O q x c 3 p 5 X a 5 a K O I h z A I R x D A G d Q g 2 u o Q w M Y S H i A J 3 j 2 t P f o v X i v 8 9 G C t 9 j Z h 2 / w 3 j 4 B e b e Q N w = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 8 l x e w t f v o T j g l q B N 3 X n 2 c b I F make their explanations uncomparable to a general user: e.g. the distinct explanations in Fig. 1 describe the same model prediction. Ideally, a set of axioms (agreed upon or debated) would constrain the space of explanations, thus leading to a framework of curated methods that the user can choose from based on which axioms are relevant to the application. V I = " > A A A B 8 H i c j V D L S g M x F L 2 p r 1 p f V Z d u g k V w V W a k o O 6 K b l x W s A 9 p h 5 J J M 2 1 o k h m S j F i G f o U b F 4 q 4 9 X P c + T d m 2 i 5 U F D w Q O J x z L / f k h I n g x n r e B y o s L a + s r h X X S x u b W 9 s 7 5 d 2 9 l o l T T V m T x i L W n Z A Y J r h i T c u t Y J 1 E M y J D w d r h + D L 3 2 3 d M G x 6 r G z t J W C D J U P G I U 2 K d d N u T x I 7 C C N / 3 y x W / 6 s 2 A / y Y V W K D R L 7 / 3 B j F N J V O W C m J M 1 / c S G 2 R E W 0 4 F m 5 Z 6 q W E J o W M y Z F 1 H F Z H M B N k s 8 B Q f O W W A o 1 i 7 p y y e q V 8 3 M i K N m c j Q T e Y B z U 8 v F 3 / z u q m N z o K M q y S 1 T N H 5 o S g V 2 M Y 4 / z 0 e c M 2 o F R N H C N X c Z c V 0 R D S h 1 n V U + l 8 J r Z O q X 6 u e X 9 c q 9 Y t F H U U 4 g E M 4 B h 9 O o Q 5 X 0 I A m U J D w A E / w j D R 6 R C / o d T 5 a Q I u d f f g G 9 P Y J l P + Q S Q = = < / l a t e x i t > z < l a t e x i t s h a 1 _ b a s e 6 4 = " K 7 c r 1 i c Z 8 H z h w I 5 6 0 g 3 J 0 Q O A Q 0 o = " > A A A B 8 H i c j V D L S g M x F L 2 p r 1 p f V Z d u g k V w V W a k o O 6 K b l x W s A 9 p h 5 J J M 2 1 o k h m S j F C H f o U b F 4 q 4 9 X P c + T d m 2 i 5 U F D w Q O J x z L / f k h I n g x n r e B y o s L a + s r h X X S x u b W 9 s 7 5 d 2 9 l o l T T V m T x i L W n Z A Y J r h i T c u t Y J 1 E M y J D w d r h + D L 3 2 3 d M G x 6 r G z t J W C D J U P G I U 2 K d d N u T x I 7 C C N / 3 y x W / 6 s 2 A / y Y V W K D R L 7 / 3 B j F N J V O W C m J M 1 / c S G 2 R E W 0 4 F m 5 Z 6 q W E J o W M y Z F 1 H F Z H M B N k s 8 B Q f O W W A o 1 i 7 p y y e q V 8 3 M i K N m c j Q T e Y B z U 8 v F 3 / z u q m N z o K M q y S 1 T N H 5 o S g V 2 M Y 4 / z 0 e c M 2 o F R N H C N X c Z c V 0 R D S h 1 n V U + l 8 J r Z O q X 6 u e X 9 c q 9 Y t F H U U 4 g E M 4 B h 9 O o Q 5 X 0 I A m U J D w A E / w j D R 6 R C / o d T 5 a Q I u d f f g G 9 P Y J m A e Q S w = = < / l a t e x i t > f (x) < l a t e x i t s h a 1 _ b a s e 6 4 = " K m E y p M H s M 7 y k 2 R U t Z + c 6 b X b V o s A = " > A A A B / n i c b V D L S s N A F J 3 U V 6 2 v q L h y M 1 i E u i m J F N R d 0 Y 3 L C v Y B T S i T y U 0 7 d P J g Z i K W U P B X 3 L h Q x K M p + 7 a V K D c j Q j H K Y V p x U g k J o W M y h L 6 m E Q l B u t k s / h S f a s X H Q S z 0 i x S e q b 8 3 M h J K O Q k 9 P Z m H l Y t e L v 7 n 9 V M V X L o Z i 5 J U Q U T n h 4 K U Y x X j v A v s M w F U 8 Y k m h A q m s 2 I 6 I o J Q p R u K x v Y 2 l 5 Z X V t v b B R 3 N z a 3 t k 1 9 / Z b K k o k Z U 0 a i U h 2 X K K Y 4 C F r A g f B O r F k J H A F a 7 u j q 8 x v 3 z G p e B T e w j h m T k A G I f c 5 J a C l v n n Y C w g M X R + n f r k H X H g M 3 5 c f K p V J 3 y x Z V W s K v E j s n J R Q j k b f / O p 5 E U 0 C F g I V R K m u b c X g p E Q C p 4 J N i r 1 E s Z j Q E R m w r q Y h C Z h y 0 u k H E 3 y i F Q / 7 k d Q V A p 6 q v y d S E i g 1 D l z d m d 2 r 5 r 1 M / M / r J u C f O y k P 4 w R Y S G e L / E R g i H A W B / a 4 Z B T E W B N C J d e 3 Y j o k k l D Q o R V 1 C P b 8 y 4 u k d V q 1 a 9 W L m 1 q p f p n H U U B H 6 B i V k Y 3 O U B K R o W g z t h t t 0 5 k B / k w Y s 0 R 7 a 7 1 6 Y k C y m Q h O O l R q 4 T q r 9 H E v N C K d F z c s U T T G Z 4 B E d G C p w T J W f z 7 M X 6 N g o I Y o S a U Z o N F e / X u Q 4 V m o W B 2 a z T K p + e q X 4 m z f I d H T h 5 0 y k m a a C L B 5 F G U c 6 Q W U R K G S S E s 1 n h m A i m c m K y B h L T L S p q / a / E r q n T f A further challenge on high-dimensional data is the sheer complexity of an explanation: in the methods described above, explanations have the same dimensionality as the data itself. Moreover, the importance of raw input features (e.g. pixels) are not individually meaningful to the user. Even when structured patterns emerge in an explanation (e.g. in Fig. 1 ) this is not sufficient to answer higher-level questions. For example, did the subject's protected attributes (e.g. gender, age, or ethnicity) have any influence on the model's decision? In this work, we develop methods for explaining predictions in terms of a digestible number of semantically meaningful concepts. We provide several options for transforming from the highdimensional raw features to a lower-dimensional latent space, which allow varying levels of user control. Regardless of the method used, transformation to a low-dimensional human-interpretable basis is a useful step, if explanations are to satisfy experts and non-experts alike. Once a set of semantic latent features is selected, one must choose an explainability algorithm to obtain quantitative information about why a certain model prediction was made. Fortunately, since the set of latent features is low-dimensional by construction, a Shapley-based approach becomes once again viable. In this work, we develop a method to apply Shapley explainability at the level of semantic latent features, thus providing a theoretically-controlled, model-agnostic foundation for explainability on high-dimensional data. Our main contributions are: • We introduce an approach to model explainability on high-dimensional data that involves encoding the raw input features into a digestible number of semantically meaningful latent features. We develop a procedure to apply Shapley explainability in this context, obtaining Shapley values that describe the high-dimensional model's dependence on each semantic latent feature. • We demonstrate 3 methods to extract semantic features for the explanations: Fourier transforms, disentangled representations, and image-to-image translation. We benchmark our approach on dSprites -with known latent space -and showcase its effectiveness in computer vision tasks such as MNIST, CIFAR-10, ImageNet, Describable Textures, and CelebA.

2. SEMANTIC SHAPLEY EXPLAINABILITY

In this section, we present a simple modular framework for obtaining meaningful low-dimensional explanations of model predictions on high-dimensional data. The framework contains two modules: (i) a mechanism for transforming from the high-dimensional space of raw model inputs to a lowdimensional space of semantic latent features, and (ii) an algorithm for generating explanations of the model's predictions in terms of these semantic features. See Fig. 2 . We will begin by describing module (ii) in Sec. 2.1, where we will show how to adapt Shapley explainability to latent features. Then we will describe several options for module (i) in Sec. 2.2.

2.1. SHAPLEY VALUES FOR LATENT FEATURES

Shapley values (Shapley, 1953) were developed in cooperative game theory to distribute the value v(N ) earned by a team N = {1, 2, . . . , n} among its players. The Shapley value φ v (i) represents



Figure 1: Pixel-based explanations of a model trained to predict the attractiveness label in CelebA.

t e x i t s h a 1 _ b a s e 6 4 = " u V b Z D j e K N 2 C U g 8 c u B Y 2 5 C q E i b B w = " > A A A B 8 H i c j V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s y I o O 6 K b l x W s A 9 p h 5 J J 7 7 S h S W Z I M k I p / Q o 3 L h R x 6 + e 4 8 2 / M t F 2 o K H g g c D j n X u 7 J i V L B j f X 9 D 6 + w t L y y u l Z c L 2 1 s b m 3

3 f 4 c 6 / c d J m o a 0 H B g 7 n 3 M s 9 c 7 y E M 6 k s 6 9 s o r a y u r W + U N y t b 2 z u 7 e + b + Q U f G q a D Q p j G P R c 8 j E j i L o K 2 Y 4 t B L B J D Q 4 9 D 1 x j e 5 3 3 0 A I V k c 3 a t J A m 5 I h h E L G C V K S w P z y A m J G n k B z o K a o x j 3 A T + e T Q d m 1 a p b M + B l Y h e k i g q 0 B u a X 4 8 c 0 D S F S l B

r 6 B L s x S 8 v k 8 5 5 3 W 7 U r + 4 a 1 e Z 1 U U c Z H a M T V E M 2 u k B N d I t a q I 0 o y t A z e k V v x p P x Y r w b H / P R k l H s H K I / M D 5 / A H I a l S w = < / l a t e x i t > f (x(z)) < l a t e x i t s h a 1 _ b a s e 6 4 = " W q A s s Z M N r O 3 t f e g I 8 7 R j q S r S A b Q = " > A A A C A X i c b V D L S s N A F J 3 4 r P U V d S O 4 G S x C u y m J F N S N F N y 4 r G A f 0 I Y y m U z a o Z M H M z d i D X X j r 7 h x o Y h b / 8 K d f + O k z U J b D 1 w 4 n H M v 9 9 7 j x o I r s

1 d o w Z q I o o e 0 T N 6 R W / G k / F i v B s f s 9 Y l I 5 8 5 Q H 9 g f P 4 A K 4 6 W E Q = = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 4 9 i H d h t u 7 W 0 + U 3 1 B o n Y L a M + r V g = " > A A A B + 3 i c j V D L S s N A F L 2 p r 1 p f s S 7 d D B b B V U l E U H d F N y 4 r 2 A c 0 o U w m k 3 b o Z B J m J t I S 8 i t u X C j i 1 h 9 x 5 9 8 4 a b t Q U f D A h c M 5 9 3 I P J 0 g 5 U 9 p x P q z K y u r a + k Z 1 s 7 a 1 v b O 7 Z + / X u y r J J K E d k v B E 9 g O s K G e C d j T T n P Z T S X E c c N o L J t e l 3 7 u n U r F E 3 O l Z S v 0 Y j w S L G M H a S E O 7 7 s V Y j 4 M I 5 Z 5 m P

Figure 2: Our proposed framework for semantic explainability.

