FORMAL CONCEPTUAL VIEWS IN NEURAL NETWORKS

Abstract

Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models.

1. INTRODUCTION

Neural networks (NN) are known for their great performance in solving learning problems. However, these excellent results are almost always achieved at the price of human explainability. This problem is addressed in research and practice from different standpoints. There are calls to refrain from using NN for important problems and to rely on explainable methods, even if they give worse results in terms of accuracy (Rudin, 2019) . The second major direction is to develop methods for explaining NN models. Such explanations can be classified as local explanations, i.e., why a particular data point was treated in a specific manner (Ribeiro et al., 2016) , and global explanations, i.e., approaches for explaining the whole NN model. The latter can be achieved, e.g., by mapping the NN to an explainable surrogate. A common approach for locally explaining NN models is to highlight activation at some hidden layer (Fong & Vedaldi, 2018) or, if possible, project this inversely. For flat data, e.g., images, this is a viable approach since an essential explanatory component, the human, can be integrated into the process. This is not the case for high-dimensional or complex data. Global approaches are more difficult, in particular for high-dimension, and therefore less frequent. A typical idea is to find an (explainable) surrogate for a NN, e.g., symbolic regression (Alaa & van der Schaar, 2019 ). We answer to the still growing interest for global explanations procedures for NN models by introducing a novel intermediate space, called (symbolic) conceptual views. We demonstrate how NN models can represented in these views and how surrogate training, e.g., with decision trees, can profit from this. We further demonstrate how to compare NN models, e.g., when derived from diverse architectures, using Gromov-Wasserstein (Mémoli, 2011) distance within the views. Moreover, we demonstrate how symbolic conceptual views can be used to represent NN models with formal concept lattices (Ganter & Wille, 1999) and profit from its human-centered approach for explainable data analysis. Finally, we show by an application of subgroup discovery how human-comprehensible propositional statements can be derived from NN models with the use of background knowledge. This allows us to extract global rules in form of propositional statements using the neurons of the NN.

2. RELATED WORK

Several approaches aim to provide insights or explanations into neural networks. Many of them highlight parts of the input that were relevant for a particular prediction (Ribeiro et al., 2016) , so called local explanations. Those however, rely on the user's capability to comprehend input data representations. Hence, this approach is infeasible for higher dimensional learning problems. To overcome this limitation, the SOTA is to interpret models using symbolic concepts, an approach of

