FORMAL CONCEPTUAL VIEWS IN NEURAL NETWORKS

Abstract

Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models.

1. INTRODUCTION

Neural networks (NN) are known for their great performance in solving learning problems. However, these excellent results are almost always achieved at the price of human explainability. This problem is addressed in research and practice from different standpoints. There are calls to refrain from using NN for important problems and to rely on explainable methods, even if they give worse results in terms of accuracy (Rudin, 2019) . The second major direction is to develop methods for explaining NN models. Such explanations can be classified as local explanations, i.e., why a particular data point was treated in a specific manner (Ribeiro et al., 2016) , and global explanations, i.e., approaches for explaining the whole NN model. The latter can be achieved, e.g., by mapping the NN to an explainable surrogate. A common approach for locally explaining NN models is to highlight activation at some hidden layer (Fong & Vedaldi, 2018) or, if possible, project this inversely. For flat data, e.g., images, this is a viable approach since an essential explanatory component, the human, can be integrated into the process. This is not the case for high-dimensional or complex data. Global approaches are more difficult, in particular for high-dimension, and therefore less frequent. A typical idea is to find an (explainable) surrogate for a NN, e.g., symbolic regression (Alaa & van der Schaar, 2019 ). We answer to the still growing interest for global explanations procedures for NN models by introducing a novel intermediate space, called (symbolic) conceptual views. We demonstrate how NN models can represented in these views and how surrogate training, e.g., with decision trees, can profit from this. We further demonstrate how to compare NN models, e.g., when derived from diverse architectures, using Gromov-Wasserstein (Mémoli, 2011) distance within the views. Moreover, we demonstrate how symbolic conceptual views can be used to represent NN models with formal concept lattices (Ganter & Wille, 1999) and profit from its human-centered approach for explainable data analysis. Finally, we show by an application of subgroup discovery how human-comprehensible propositional statements can be derived from NN models with the use of background knowledge. This allows us to extract global rules in form of propositional statements using the neurons of the NN.

2. RELATED WORK

Several approaches aim to provide insights or explanations into neural networks. Many of them highlight parts of the input that were relevant for a particular prediction (Ribeiro et al., 2016) , so called local explanations. Those however, rely on the user's capability to comprehend input data representations. Hence, this approach is infeasible for higher dimensional learning problems. To overcome this limitation, the SOTA is to interpret models using symbolic concepts, an approach of neuro→symbolic AI (Sarker et al., 2022) . For example, Mao et al. (2019 ), Asai & Fukunaga (2018) and Fong & Vedaldi (2018) introduce methods which classify the inputs of a model to pre-defined concepts. Hence, they require manually created input representations for all pre-defined concepts, in contrast of extracting them automatically. Particularly successful is TCAV (Kim et al., 2018) , which predicts the importance of user-defined concepts. The above are complemented by methods that automatically detect concepts for a given set of input/output pairs through identifying similar patterns of input samples at a given layer, e.g., ACE (Ghorbani et al., 2019) . So far these methods do detect only particularly outstanding concepts. Recent works try to estimate to which extent a detected set of concepts is capable to approximate the model (Yeh et al., 2020) . This approach, however, emphasizes classification performance and not explainability, i.e., concepts that are important for explanations may be omitted. This is in general true for surrogate based procedures that were not designed towards human comprehensibility (Alaa & van der Schaar, 2019). Moreover, a recent study shows that the translation of initial layers does often correlate with random layers or gradient detectors in the input (Adebayo et al., 2018) . The most crucial downside of the automatic detection methods above is that although they provide symbolic concepts, these do not have to be interpretable. The overall principle of our approach is based on the fact that a substantial portion of the input data is aggregated and represented in the last hidden layer (Clark et al., 2019; Korbar et al., 2017) . A global interpretation of the NN needs a decoding into a human comprehensible symbolic view. A mathematical method for human comprehensible conceptualizations in the language of algebra is formal concept analysis (FCA) (Ganter & Wille, 1999; Wille, 1982) . In particular its well-elaborated conceptual scaling theory (Ganter & Wille, 1989) provides an extensive tool-set to analyze NNs. This tool-set enables both the translation of neural representations into a symbolic space and further on the translation of this space into a human explainable space (Hanika & Hirth, 2022; 2021) .

3. THE VIEWS OF NEURAL NETWORKS

We introduce in the following two notions of conceptual view of a neural network, in detail a manyvalued and a symbolic view. Both provide novel methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons. In addition to that the symbolic view facilitates the application of abductive learning procedures. This results in rules that allow to explain a NN by means of human comprehensible terminology, as well as, in terms of the neurons. Let N be the set of neurons of the last hidden layer of a NN. We interpret NNs as a function that maps input objects g ∈ G, that are represented as g = (v 1 , . . . , v m ) ∈ R m , to outputs in [0, 1] |C| for classes C. The parameter m specifies the number of input features (see Figure 1 ). Naturally, we can interpret each neuron n ∈ N as a function by itself from the input layer up to the activation of n, i.e., n : R m → R. The output neurons can be characterized analogously by a map c : R |N | → R. With w i,j we address the weights connecting the output neuron c i ∈ C with hidden neuron n j ∈ N . Definition 1 (Many-Valued Conceptual View) Let NN be a neural network, C its output classes and N = {n 1 , . . . , n h } the neurons of the last hidden layer. We define the many-valued conceptual view as V = (O, W), where O ∈ R |G|×|N | with value at (i, j) equal to the activation n j (g i ), called Object View, and W ∈ R |C|×|N | with value at (i, j) equal to the weight w i,j , called Class View. To give a short motivation: With the object view O, we want to study the activation of the neurons N given an object g. Complementary, with the class view W, we investigate the relation of the neurons N to the outputs c ∈ C by their corresponding weights w i,j . For example, we refer the reader to Figure 1 , which depicts the object and class view (right) of the network (left). In this, we find that n k (o t ) is greater than n 1 (o t ), from which we infer that the relation of o t to n k is greater than n 1 . We want to employ the just introduced views to comprehend the complete classification that is captured by a NN model. We can represent any object g as a row in the object view matrix, i.e., O(g) := (n 1 (g), . . . , n h (g)). Analogously, we can represent any class c i as a row in the class view matrix, i.e., W (c i ) := (w i,1 , . . . , w i,h ). The outputs of the NN for class c i follow from the term O(g) • W (c i ) + b, where b is a bias. This can be rewritten as O(g) • W (c i ) cos(O(g), W (c i )) + b where cos(O(g), W (c i )) is the cosine value of the angle between O(g) and W (c i ). Thus, to understand the inner representation of the classes C within the NN, it may be reasonable to grasp the objects and classes in the same space and classify objects using similarity measures. Using this approach we can introduce an object-class distance map

