A SELF-EXPLANATORY METHOD FOR THE BLACK BOX PROBLEM ON DISCRIMINATION PART OF CNN

Abstract

Recently, for finding inherent causality implied in CNN, the black box problem of its discrimination part, which is composed of all fully connected layers of the CNN, has been studied by different scientific communities. Many methods were proposed, which can extract various interpretable models from the optimal discrimination part based on inputs and outputs of the part for finding the inherent causality implied in the part. However, the inherent causality cannot readily be found. We think that the problem could be solved by shrinking an interpretable distance which can evaluate the degree for the discrimination part to be easily explained by an interpretable model. This paper proposes a lightweight interpretable model, Deep Cognitive Learning Model(DCLM). And then, a game method between the DCLM and the discrimination part is implemented for shrinking the interpretation distance. Finally, the proposed self-explanatory method was evaluated by some contrastive experiments with certain baseline methods on some standard image processing benchmarks. These experiments indicate that the proposed method can effectively find the inherent causality implied in the discrimination part of the CNN without largely reducing its generalization performance. Moreover, the generalization performance of the DCLM also can be improved.

1. INTRODUCTION

Convolution neural network(CNN) has surpassed human abilities in some specific tasks such as computer game and computer vision etc. However, they are considered difficult to understand and explain (Brandon, 2017) , which leads to many problems in aspects of privacy leaking, reliability and robustness. Explanation technology is of immense help for companies to create safer, more trustable products, and to better manage any possible liability of them (Riccardo et al., 2018) . Recently, for finding inherent causality implied in the CNN, the unexplainable problem of CNN, especially concerning the discrimination part which is composed of the fully connected layers of the CNN, has been studied by different scientific communities. Many methods were proposed, which can extract various interpretable models from the optimal discrimination part based on inputs and outputs of the part for expressing the inherent causality implied in the part. However, because of data bias and noisy data in the training data set, the inherent causality cannot readily be found because the part is difficult to be approximated by any interpretable model. We think that the problem could be solved by the following procedure. Firstly, a lightweight interpretable model is designed which can be easily understood by human. And then, the model is initiatively extracted from the discrimination part by solving a Maximum Satisfiability(MAX-SAT) problem based on the activated states of the neurons in the first layer and the output layer of the part. An new distance is proposed which can evaluate the degree to which the discrimination part is easily explained, namely as interpretability performance or interpretable distance. For shrinking the interpretable distance, a game process between the interpretable model and the discrimination part is implemented. Finally, the optimal interpretable model can be obtained, which can express inherent causality implied in the discrimination part. Moreover, based on the procedure, it is also possible to monitor the evolution of the inherent causality implied in the part in the game process. Main contributions of this paper can be summarized as follows: • An interpretable model, Deep Cognitive Learning Model(DCLM), is proposed to express the inherent causality implied in the discrimination part, and a greedy method is given 1

