GRAPH STRUCTURAL AGGREGATION FOR EXPLAIN-ABLE LEARNING

Abstract

Graph neural networks have proven to be very efficient to solve several tasks in graphs such as node classification or link prediction. These algorithms that operate by propagating information from vertices to their neighbors allow one to build node embeddings that contain local information. In order to use graph neural networks for graph classification, node embeddings must be aggregated to obtain a graph representation able to discriminate among different graphs (of possibly various sizes). Moreover, in analogy to neural networks for image classification, there is a need for explainability regarding the features that are selected in the graph classification process. To this end, we introduce StructAgg, a simple yet effective aggregation process based on the identification of structural roles for nodes in graphs that we use to create an end-to-end model. Through extensive experiments we show that this architecture can compete with state-of-the-art methods. We show how this aggregation step allows us to cluster together nodes that have comparable structural roles and how these roles provide explainability to this neural network model. In this work, we introduce an aggregation process based on the identification of structural roles in graphs that is computed in an end-to-end trainable fashion. We build a hierarchical representation of nodes by using neural network models in graphs to propagate nodes' features at different hops. Recently there has been a rich line of research, inspired by deep models in images, that aims at

1. INTRODUCTION

Convolution neural networks (LeCun et al., 1995) have proven to be very efficient at learning meaningful patterns for many articificial intelligence tasks. They convey the ability to learn hierarchical information in data with Euclidean grid-like structures such as images and text. Convolutional Neural Networks (CNNs) have rapidly become state-of-the art methods in the fields of computer vision (Russakovsky et al., 2015) and natural language processing (Devlin et al., 2018) . However in many scientific fields, studied data have an underlying graph or manifold structure such as communication networks (whether social or technical) or knowledge graphs. Recently there have been many attempts to extend convolutions to those non-Euclidean structured data (Hammond et al., 2011; Kipf & Welling, 2016; Defferrard et al., 2016) . In these new approaches, the authors propose to compute node embeddings in a semi-supervised fashion in order to perform node classification. Those node embeddings can also be used for link prediction by computing distances between each node of the graph (Hammond et al., 2011; Kipf & Welling, 2016) . Graph classification is studied in many fields. Whether for predicting the chemical activity of a molecule or to cluster authors from different scientific domains based on their ego-networks (Freeman, 1982) . However when trying to generalize neural network approaches to the task of graph classification there are several aspects that differ widely from image classification. When trying to perform graph classification, we can deal with graphs of different sizes. To compare them we first need to obtain a graph representation that is independant of the size of the graph. Moreover, for a fixed graph, nodes are not ordered. The graph representation obtained with neural network algorithms must be independant of the order of nodes and thus be invariant by node permutation. Aggregation functions are functions that operate on node embeddings to produce a graph representation. When tackling a graph classification task, the aggregation function used is usually just a mean or a max of node embeddings as illustrated in figure 1b . But when working with graphs of large sizes, the mean over all nodes does not allow us to extract significant patterns with a good discriminating power. In order to identify patterns in graphs, some methods try to identify structural roles for nodes. Donnat et al. (2018) define structural role discovery as the process of identifying nodes which have topologically similar network neighborhoods while residing in potentially distant areas of the network as illustrated in figure 1a . Those structural roles represent local patterns in graphs. Identifying them and comparing them among graphs could improve the discriminative power of graph embeddings obtained with graph neural networks. In this work, we build an aggregation process based on the identification of structural roles, called StructAgg. The main contributions of this work are summarized bellow: 1. Learned aggregation process. A differentiable aggregation process that learns how to aggregate node embeddings in order to produce a graph representation for a graph classification task. 2. Identification of structural roles. Based on the definition of structural roles from Donnat et al. ( 2018), our algorithm learns structural roles during the aggregation process. This is innovative because most algorithms that learn structural roles in graphs are not based on graph neural networks. 3. Explainability of selected features for a graph classification task. The identification of structural roles enables us to understand and explain what features are selected during training. Graph neural networks often lack explainability and there are only few works that tackle this issue. One contribution of this work is the explainability of the approach. We show how our end-to-end model provides interpretability to a graph classification task based on graph neural networks. 4. Experimental results. Our method achieves state-of-the-art results on benchmark datasets. We compare it with kernel methods and state-of-the-art message passing algorithms that use pooling layers as aggregation processes. (a) nodes with the same structural role are classified together (same color). (b) aggregation process to create a graph embedding, the node features are summed to produce a representation of the graph. 

2. RELATED WORK

The identification of nodes that have similar structural roles is usually done by an explicit featurization of nodes or by algorithms that rely on random walks to explore nodes' neighborhoods. A well known algorithm in this line of research is RolX (Gilpin et al., 2013; Henderson et al., 2012) , a matrix factorization that focuses on computing a soft assignment matrix based on a listing of topological properties set as inputs for nodes. Similarly struct2vec builds a multilayered graph based on topological metrics on nodes and then generates random walks to capture structural information. In another line of research, many works rely on graphlets to capture nodes' topological properties and identify nodes with similar neighborhoods (Rossi et al., 2017; Lee et al., 2018; Ahmed et al., 2018) . In their work, Donnat et al. ( 2018) compute node embeddings from wavelets in graphs to caracterize nodes' neighborhood at different scales.



Figure 1: Identification of structural roles and aggregation of node features over the whole graph.

