JOINT GAUSSIAN MIXTURE MODEL FOR VERSATILE DEEP VISUAL MODEL EXPLANATION

Abstract

Post-hoc explanations of deep neural networks improve human understanding on the learned representations, decision-making process and uncertainty of the model with faithfulness. Explaining deep convolutional neural networks (DCNN) is especially challenging, due to the high dimensionality of deep features and the complexity of model inference. Most post-hoc explaining methods serve a single form of explanation, restricting the diversity and consistency of the explanation. This paper proposes joint Gaussian mixture model (JGMM), a probabilistic model jointly models inter-layer deep features and produces faithful and consistent post-hoc explanations. JGMM explains deep features by Gaussian mixture model and inter-layer deep feature relations by posterior distribution on the latent component variables. JGMM enables a versatile explaining framework that unifies interpretable proxy model, global or local explanatory example generation or mining. Experiments are performed on various DCNN image classifiers in comparison with other explaining methods. It shows that JGMM can efficiently produce versatile, consistent, faithful and understandable explanations.

1. INTRODUCTION

Deep convolutional neural networks (DCNN) is a powerful type of machine learning model for visual recognition tasks. The key reasons give rise to the power of DCNN include the expressive visual representations and the decision-making mechanism encoded in massive trainable convolution parameters. However, increasing model complexity incurs heavier burden for human to understand learned representations and decision making of the model. The high dimensionality and entanglement of deep features and the complexity of neural network inference are often considered the main hindrance to explaining black-box DCNN. A recent proliferation of studies in post-hoc DCNN explainability show several effective and practical DCNN explaining methods. Proxy models are interpretable models (e.g. decision trees and linear models) that has approximate decision-making behavior as the black-box model inference. Proxy models have inference process that can be intuitively understood by human, such as LIME (proposed byRibeiro et al. ( 2016 



)), a linear classifier as a local proxy model. To make sure the proxy model is an accurate surrogate, its faithfulness should be tested. The proxy model's predictions on unseen examples should be close to the black-box model's prediction, even if different from the ground truth. The intermediate representations of DCNN are usually explained globally (not associated with a specific data point) by explanatory examples and association with semantic concepts. Prototypes, criticisms and influential examples are common types of global explanatory examples. Prototypes are representative examples of a certain pattern of deep features, illustrating a learned visual concept of the model. In contrast, criticisms (proposed by Kim et al. (2016)) are examples not well-represented in the deep representations, i.e. outliers of the deep features, revealing the flaw of the learned representations. Influential examples are hard examples for the model training, having more influence on the final decision boundary than others. From a post-hoc view, influential examples lie close to the decision boundary. An example can be both influential and representative (or unrepresentative). Local explanations are based on a specific query example, showing how the change of the query example features will affect the model prediction. Counterfactual examples offer an actionable re-1

