JOINT GAUSSIAN MIXTURE MODEL FOR VERSATILE DEEP VISUAL MODEL EXPLANATION

Abstract

Post-hoc explanations of deep neural networks improve human understanding on the learned representations, decision-making process and uncertainty of the model with faithfulness. Explaining deep convolutional neural networks (DCNN) is especially challenging, due to the high dimensionality of deep features and the complexity of model inference. Most post-hoc explaining methods serve a single form of explanation, restricting the diversity and consistency of the explanation. This paper proposes joint Gaussian mixture model (JGMM), a probabilistic model jointly models inter-layer deep features and produces faithful and consistent post-hoc explanations. JGMM explains deep features by Gaussian mixture model and inter-layer deep feature relations by posterior distribution on the latent component variables. JGMM enables a versatile explaining framework that unifies interpretable proxy model, global or local explanatory example generation or mining. Experiments are performed on various DCNN image classifiers in comparison with other explaining methods. It shows that JGMM can efficiently produce versatile, consistent, faithful and understandable explanations.

1. INTRODUCTION

Deep convolutional neural networks (DCNN) is a powerful type of machine learning model for visual recognition tasks. The key reasons give rise to the power of DCNN include the expressive visual representations and the decision-making mechanism encoded in massive trainable convolution parameters. However, increasing model complexity incurs heavier burden for human to understand learned representations and decision making of the model. The high dimensionality and entanglement of deep features and the complexity of neural network inference are often considered the main hindrance to explaining black-box DCNN. A recent proliferation of studies in post-hoc DCNN explainability show several effective and practical DCNN explaining methods. Proxy models are interpretable models (e.g. decision trees and linear models) that has approximate decision-making behavior as the black-box model inference. Proxy models have inference process that can be intuitively understood by human, such as LIME (proposed byRibeiro et al. ( 2016)), a linear classifier as a local proxy model. To make sure the proxy model is an accurate surrogate, its faithfulness should be tested. The proxy model's predictions on unseen examples should be close to the black-box model's prediction, even if different from the ground truth. Most DCNN explaining methods are single-purpose systems. Employing different explaining methods is possible to give diverse explanations, but it's not guaranteed that different explanations are compatible and consistent with each other. For example, a counterfactual example generation system may suggest that the model is sensitive to a certain feature; but a global explaining method, such as a proxy model, may have conflicts with the former explanation. There is no hard rule to determine which explanation is correct or more understandable. Thus a generic explaining framework that enables various and consistent explaining forms has important value for the explainability of DCNN. Higher Feature Y Lower Feature X

Black-box Model(part)

Joint GMM Step 1: Probabilistic Modelling Proxy Model: P(Y|X)

Global Explanations

Local Explanations

Interpretability and Faithfulness Validation

Example: Prototypes, Criticisms, Influentials Example: Counterfactuals and Semifactuals Step 2: Explaining We propose a probabilistic framework for a versatile explaining method for DCNN. Figure 1 demonstrates the pipeline of the framework and the enabled explanation forms. The lower and the higher features are intermediate representations of DCNN from two different layers. In the computational graph of DCNN, the higher feature is dependent on the lower feature and the black-box model (part of the DCNN) between them. In an image classification setting, the higher feature can be the classification probabilistic prediction of the black-box model, and the lower features can be the input of the DCNN, i.e. the raw image. The goal of the framework is to explain the learned representations of higher/lower features as well as the black-box model (either the whole DCNN or a part of DCNN) between them. To serve this purpose, we propose joint Gaussian mixture model (JGMM) to jointly model the distribution of lower/higher features and produce a validatable proxy model and example-based global/local explanations. JGMM is a probabilistic model based on GMM. JGMM learns two Gaussian mixture model (GMM) respectively for the lower features X and higher features Y. The latent categorical variables of both GMMs are connected by an estimated posterior probability matrix. JGMM is introduced in section 3 in details. Compared with single purpose explaining methods, the proposed JGMM-based versatile explaining method has two advantages: (1) various forms of model explanations are efficiently produced from one framework; (2) the consistency among different explanations is guaranteed, as they are computed from a common probabilistic model. The proposed explaining method is evaluated with various DCNN models and benchmarks in comparison with other explaining methods in section 4. The experiments show that JGMM can efficiently produce versatile, consistent, faithful and understandable explanations.

2. RELATED WORKS

Proxy models are widely leveraged method to produce model-agnostic explanations. Local interpretable model-agnostic explanations (LIME) by Ribeiro et al. ( 2016) is a typical method that learns a linear classifier by sampling from data points from a local region and the black-box classier. For



The intermediate representations of DCNN are usually explained globally (not associated with a specific data point) by explanatory examples and association with semantic concepts. Prototypes, criticisms and influential examples are common types of global explanatory examples. Prototypes are representative examples of a certain pattern of deep features, illustrating a learned visual concept of the model. In contrast, criticisms (proposed by Kim et al. (2016)) are examples not well-represented in the deep representations, i.e. outliers of the deep features, revealing the flaw of the learned representations. Influential examples are hard examples for the model training, having more influence on the final decision boundary than others. From a post-hoc view, influential examples lie close to the decision boundary. An example can be both influential and representative (or unrepresentative). Local explanations are based on a specific query example, showing how the change of the query example features will affect the model prediction. Counterfactual examples offer an actionable re-course for the model decision. Counterfactual examples answer the 'what if' questions by minimal change of query example features and resulting different model decision against the query. Counterfactual examples reveal the sensitive features for the query. Semi-factual examples, in contrast, aim to answer the 'even if' question. Semi-factual examples have significant change on a certain feature(s) from the query, but both have the same model prediction. Semi-factual examples reveal the insensitive features of the query. For example, Kenny & Keane (2021) proposes a method to generate counterfactual and semi-factual examples from one system.

Figure 1: Proposed probabilistic framework for a versatile explaining method. The intermediate features (X and Y) are jointly modelled by a probabilistic model joint Gaussian mixture model (JGMM).

