A SELF-EXPLANATORY METHOD FOR THE BLACK BOX PROBLEM ON DISCRIMINATION PART OF CNN

Abstract

Recently, for finding inherent causality implied in CNN, the black box problem of its discrimination part, which is composed of all fully connected layers of the CNN, has been studied by different scientific communities. Many methods were proposed, which can extract various interpretable models from the optimal discrimination part based on inputs and outputs of the part for finding the inherent causality implied in the part. However, the inherent causality cannot readily be found. We think that the problem could be solved by shrinking an interpretable distance which can evaluate the degree for the discrimination part to be easily explained by an interpretable model. This paper proposes a lightweight interpretable model, Deep Cognitive Learning Model(DCLM). And then, a game method between the DCLM and the discrimination part is implemented for shrinking the interpretation distance. Finally, the proposed self-explanatory method was evaluated by some contrastive experiments with certain baseline methods on some standard image processing benchmarks. These experiments indicate that the proposed method can effectively find the inherent causality implied in the discrimination part of the CNN without largely reducing its generalization performance. Moreover, the generalization performance of the DCLM also can be improved.

1. INTRODUCTION

Convolution neural network(CNN) has surpassed human abilities in some specific tasks such as computer game and computer vision etc. However, they are considered difficult to understand and explain (Brandon, 2017) , which leads to many problems in aspects of privacy leaking, reliability and robustness. Explanation technology is of immense help for companies to create safer, more trustable products, and to better manage any possible liability of them (Riccardo et al., 2018) . Recently, for finding inherent causality implied in the CNN, the unexplainable problem of CNN, especially concerning the discrimination part which is composed of the fully connected layers of the CNN, has been studied by different scientific communities. Many methods were proposed, which can extract various interpretable models from the optimal discrimination part based on inputs and outputs of the part for expressing the inherent causality implied in the part. However, because of data bias and noisy data in the training data set, the inherent causality cannot readily be found because the part is difficult to be approximated by any interpretable model. We think that the problem could be solved by the following procedure. Firstly, a lightweight interpretable model is designed which can be easily understood by human. And then, the model is initiatively extracted from the discrimination part by solving a Maximum Satisfiability(MAX-SAT) problem based on the activated states of the neurons in the first layer and the output layer of the part. An new distance is proposed which can evaluate the degree to which the discrimination part is easily explained, namely as interpretability performance or interpretable distance. For shrinking the interpretable distance, a game process between the interpretable model and the discrimination part is implemented. Finally, the optimal interpretable model can be obtained, which can express inherent causality implied in the discrimination part. Moreover, based on the procedure, it is also possible to monitor the evolution of the inherent causality implied in the part in the game process. Main contributions of this paper can be summarized as follows: • An interpretable model, Deep Cognitive Learning Model(DCLM), is proposed to express the inherent causality implied in the discrimination part, and a greedy method is given for initiatively extracting the DCLM from the discrimination part by solving its Maximum Satisfiability(MAX-SAT) Problem. • A new game method is proposed to improve the interpretability performance of the discrimination part without largely reducing its generalization performance by iteratively shrinking the interpretable distance between DCLM and the discrimination part. • A new distance is proposed to evaluate the degree to which the discrimination part is easily explained, namely as interpretability performance or interpretable distance.

2. RELATED WORK

There are usually two types of methods for the unexplainable problem of the discrimination part, such as post-hoc method and ante-hoc method (Holzinger et al., 2019) . However, because ante-hoc method is a transparent modeling method(Arrietaa et al., 2020), it can not obtain an explanation about the discrimination part. So, the post-hoc method will be reviewed. Early post-hoc method can obtain global explanations for a neural network by extracting an interpretable model. Some references (Craven & Shavlik, 1999; Krishnan et al., 1999; Boz, 2002; Johansson & Niklasson, 2009) proposed a few methods that can find a decision tree for explaining a neural network by maximizing the gain ratio and an estimation of the current model fidelity. Other references (Craven & Shavlik, 1994; Johansson & Niklasson, 2003; Augasta & Kathirvalavakumar, 2012; Sebastian et al., 2015; Zilke et al., 2016) proposed rule extraction methods for searching optimal interpretable rules from a neural network. Recently, some feature relevance methods have become progressively more popular. Montavon et al.(Montavon et al., 2017) proposed a decomposition method from a network classification decision into contributions of its input elements based on deep Taylor decomposition. Shrikumar et al.(Shrikumar et al., 2016) proposed DeepLIFT which can compute importance scores in a multilayer neural network by explaining the difference of the output from some reference output in terms of differences of the inputs from their reference inputs. Some other works make complex black box model simpler. Che et al. (Che et al., 2017) proposed a simple distillation method called Interpretable Mimic Learning for extracting an interpretable simple model by gradient boosting trees. Thiagarajan et al. (Thiagarajan et al., 2016) build a Treeview representation of the complex model by hierarchical partitioning of the feature space. In addition, some references (Hinton et al., 2015; Bucila et al., 2006; Frosst & Hinton, 2017; Traore et al., 2019) proposed the distillation method of knowledge from an ensemble of models into a single model. Wu et al.(M. Wu, 2018) proposed a tree regularization method via knowledge distillation to represent the output feature space of a RNN based on a Multilayered perception. However, these methods can only solve the unexplainable problem of trained neural network or trained deep neural networks with explicit input characteristics. Wan et al.(Wan et al., 2020) constructed a decision tree using the last fully connection layer of the discrimination part of a CNN based on a prior structure. In the paper, our goal is to find the inherent causality implied in the discrimination part of CNN, which is composed of all fully connected layers of the CNN without hurting its generalization performance by initiatively extracting its logic relationships with no prior structure and finally obtain its explanation by these logic relationships.

3. DEEP COGNITIVE LEARNING MODEL

For expressing the causal relationship between these neurons in the discrimination part, a new interpretable model is designed in the section. As we all known, a CNN includes a feature extractor and a discrimination part. The feature extractor composes of some convolution layers and some pooling layers. The outputs from the feature extractor are the inputs of the discrimination part of the CNN, namely feature maps, τ 1 , τ 2 , ..., τ k where k is the number of feature maps. All these feature maps form a feature set Γ. We suppose that the discrimination part should better be explained by the logic relationships of the activated states of the neurons in its first layer and its output layer. This is because the relationships

