GRAPH NEURAL NETOWRK POOLING BY EDGE CUT

Abstract

Graph neural networks (GNNs) are very efficient at solving several tasks in graphs such as node classification or graph classification. They come from an adaptation of convolutional neural networks on images to graph structured data. These models are very effective at finding patterns in images that can discriminate images from each others. Another aspect leading to their success is their ability to uncover hierarchical structures. This comes from the pooling operation that produces different versions of the input image at different scales. The same way, we want to identify patterns at different scales in graphs in order to improve the classification accuracy. Compared to the case of images, it is not trivial to develop a pooling layer on graphs. This is mainly due to the fact that in graphs nodes are not ordered and have irregular neighborhoods. To aleviate this issue, we propose a pooling layer based on edge cuts in graphs. This pooling layer works by computing edge scores that correspond to the importance of edges in the process of information propagation of the GNN. Moreover, we define a regularization function that aims at producing edge scores that minimize the minCUT problem. Finally, through extensive experiments we show that this architecture can compete with state-ofthe-art methods.

1. INTRODUCTION

Convolution neural networks (LeCun et al., 1995) have been proven to be very efficient at learning meaningful patterns for many articificial intelligence tasks. They convey the ability to learn hierarchical informations in data with Euclidean grid-like structures such as images and textual data. Convolutional Neural Networks (CNNs) have rapidly become state-of-the-art methods in the fields of computer vision (Russakovsky et al., 2015) and natural language processing (Devlin et al., 2018) . However in many scientific fields, studied data have an underlying graph or manifold structure such as communication networks (whether social or technical) or knowledge graphs. Recently there have been many attempts to extend convolution to such non-Euclidean structured data (Hammond et al., 2011; Kipf & Welling, 2016; Defferrard et al., 2016) . In these new approaches, the authors propose to compute node embeddings in a semi-supervised fashion in order to perform node classification. Those node embeddings can also be used for link prediction by computing distances between each node of the graph (Hammond et al., 2011; Kipf & Welling, 2016 ). An image can be seen as a special case of graph that lies on a 2D grid and where nodes are pixels and edges are weighted according to the difference of intensity and to the distance between two pixels (Zhang et al., 2015; Achanta & Susstrunk, 2017; Van den Bergh et al., 2012; Stutz et al., 2018) . In the emerging field of graph analysis based on convolutions and deep neural networks, it is appealing to try to apply models that worked best in the field of computer vision. In this effort, several ways to perform convolutions in graphs have been proposed (Hammond et al., 2011; Kipf & Welling, 2016; Defferrard et al., 2016; Gilmer et al., 2017; Veličković et al., 2017; Xu et al., 2018; Battaglia et al., 2016; Kearnes et al., 2016) . Moreover, when dealing with image classification, pooling is an important step (Gao & Ji, 2019; Ying et al., 2018; Defferrard et al., 2016; Diehl, 2019) . It allows us to extract hierarchical features in images in order to make the classification more accuracte. While it is easy to apply coarsening to an image, it isn't obvious how to coarsen a graph since nodes in graphs are not ordered like pixels in images. In this work we present a novel pooling layer based on edge scoring and related to the minCUT problem. The main contributions of this work are summarized below: 1. Learned pooling layer. A differentiable pooling layer that learns how to aggregate nodes in clusters to produce a pooled graph of reduced size.

2.

A novel approach based on edge cuts. We develop a novel pooling layer. Most coarsening strategies are based on nodes, either by finding clusters or by deleting nodes that carry less information of the graph structure. In our approach, we focus on edges to uncover communities of topologically close nodes in graphs. 3. The definition of a regularization that aims at approximating the problem of minCUT. We regularize our problem by a term that corresponds to the problem of Ncut in order to learn edge scores and clusters that are consistent with the topology of the graph. We show that by computing an edge score matrix, we can easily compute this regularization term. 4. Experimental results. Our method achieves state-of-the-art results on benchmark datasets. We compare it with kernel methods and state-of-the-art message passing algorithms that use pooling layers as aggregation processes.

2. RELATED WORK

Recently there has been a rich line of research, inspired by deep models in images, that aims at redefining neural networks in graphs and in particular convolutional neural networks (Defferrard et al., 2016; Kipf & Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017; Bronstein et al., 2017; Bruna et al., 2013; Scarselli et al., 2009) . Those convolutions can be viewed as message passing algorithms that are composed of two phases Gilmer et al. (2017) . They find their success in their ability to uncover meaningful patterns in graphs by propagating information from nodes to their neighbors. Moreover, many works on graph neural networks also focus on redefining pooling in graphs. The pooling operation allows us to obtain different versions of the input graph at different scales. In graphs, the pooling step isn't trivial because of the nature the data. Nodes can have different numbers of neighbors and graphs can have different sizes. To cope with these issues, different pooling strategies have been proposed: • Top-k: Like Gao & Ji (2019), the objective is to score nodes according to their importance in the graph and then to keep only nodes with the top-k scores. By removing nodes we can remove important connections in the graph and produce disconnected graphs. A step to increase connectivity is necessary. This is done by adding edges at 2-hops from the input graph. • Cluster identification: This is usually done by projecting node features on a learned weight to obtain an assignment matrix. Nodes that have close embeddings are projected on the same cluster. After having obtained the assignment matrix, super nodes at the coarsened level can be computed by aggregating all nodes that belong to the same cluster (Ying et al., 2018) . • Edge based pooling: An edge contraction pooling layer has recently been proposed by Diehl (2019). They compute edge scores in order to successively contract pairs of nodes, which means that they successively merge pairs of nodes that are linked by edges of the highest scores. • Deterministic coarsening strategies: Finally, a way to perform pooling in graphs can simply be to apply a deterministic clustering algorithm in order to identify clusters of nodes that will represent super nodes in the coarsened level (Defferrard et al., 2016; Ying et al., 2018) . The main drawback of it is that the strategy isn't learned and thus may not be suited to the graph classification task. In this work we define a new pooling layer that is based on edge cuts. Like Diehl (2019) we focus our pooling method on edges instead of nodes. In their work, Diehl (2019) calculate scores on edges to perform contraction pooling. This means that at each pooling step, they merge pairs of nodes that are associated with the highest edge scores, without merging nodes that were already involved in a contracted edge. This method results in pooled graphs of size divided by 2 compared to the input graph. The main similarity with our work is that we compute edge scores to characterize edge importance inspired by Graph Attention Transform (Veličković et al., 2017) . There are several differences with

