MEGRAPH: GRAPH REPRESENTATION LEARNING ON CONNECTED MULTI-SCALE GRAPHS

Abstract

We present MeGraph, a novel network architecture for graph-structured data. Given any input graph, we create multi-scale graphs using graph pooling. Then, we connect them into a mega graph by bridging inter-graph edges according to the graph pooling results. Instead of universally stacking graph convolutions over the mega graph, we apply general graph convolutions over intra-graph edges, while the convolutions over inter-graph edges follow a bidirectional pathway to deliver the information along the hierarchy for one turn. Graph convolution and graph pooling are two core elementary operations of MeGraph. In our implementation, we adopt the graph full network (GFuN) and propose the stridden edge contraction pooling (S-EdgePool) with adjustable pooling ratio, which are extended from conventional graph convolution and edge contraction pooling. The MeGraph model enables information exchange across multi-scale graphs, repeatedly, for deeper understanding of wide range correlations in graphs. This distinguishes MeGraph from many recent hierarchical graph neural networks like Graph U-Nets. We conduct comprehensive empirical studies on tens of public datasets, in which we observe consistent performance gains comparing to baselines. Specifically, we establish 5 new graph theory benchmark tasks that require long-term inference and deduction to solve, where MeGraph demonstrates dominated performance compared with popular graph neural networks.

1. INTRODUCTION

In real-world applications, many types of data can be naturally organized as graphs, such as social networks, traffic networks and biological data. Recent advances in graph neural networks (GNNs) have inherited the great success of convolutional neural networks (CNNs) from images to deal with graph-structured data. Popular methods include the GCN (Kipf & Welling, 2016) , GIN (Xu et al., 2018) , GAT (Vaswani et al., 2017) and Graph U-Nets (Gao & Ji, 2019) , etc. Generally, the development of both CNNs and GNNs is co-evolved, and most effective experiences identified in CNNs are also helpful for GNNs. For example, we have witnessed coupled networks for image and graph data, like CNN vs. GCN, attentional CNN vs. GAT (Vaswani et al., 2017) , and U-Net (Ronneberger et al., 2015) vs. Graph U-Net (Gao & Ji, 2019), etc. Instead of directly transferring advances in CNNs to GNNs, we investigate inherent characteristics in graphs and design a new architecture accordingly. We use the following example to motivate the story. Consider the problem of identifying the shortest path in a chain graph. Using normal graph convolutions, we have to stack multiple graph convolutional layers to enlarge the receptive field to cover the source and the destination nodes. However, if the architecture could infer from a larger scope, e.g., constructing multi-scale graphs in a hierarchy, the shortest path is easier to be estimated by aggregating and delivering information from multi-level scopes. In addition, a single turn of information aggregation or delivery over the hierarchical structure might not be sufficient, because estimation should be refined and deduced over and over again to achieve sure conclusions. That is, the architecture has to repeat the information exchange across the hierarchy multiple times to identify the shortest path for sure. This example will be investigated in our experiment in Section 4. In fact, there have been several recent GNNs working on a hierarchical graph structure. The Graph U-Nets (Gao & Ji, 2019) forms a hierarchy by downsampling the graph with iterative convolutions and top-k pooling, and then upsampling the pooled graph with iterative convolutions and unpooling operators. However, the U-shaped net only propagates the information for a single turn. The GraphFPN (Zhao et al., 2021) builds mappings between the image and graph feature pyramids according to the superpixel hierarchy, and it applies GNN layers on the hierarchical graph to exchange information within the graph pyramid; while the flow of inference still propagates for a single pass over a fixed contextual-hierarchical-contextual structure, as shown in Fig. 1 of (Zhao et al., 2021) . In this paper, we provide a novel perspective for hierarchical graph representation learning. We use differentiable graph pooling methods to create mult-scale graphs, which were also referred to as the graph pyramid in previous methods (Zhao et al., 2021) . Conditioning on the graph pooling results, we explicitly connect multi-scale graphs into a mega graph according to how the nodes are pooled together (illustrated in Fig. 1 ). A straightforward way to learn on the mega graph is to adopt the naive message-passing strategy, which abandons the hierarchical prior knowledge. Instead, we convolve the intra-graph edges and inter-graph edges separately. That is, we stack general graph convolutions over intra-graph edges, while convolutions over inter-graph edges follow a bidirectional pathway to deliver the information along the hierarchy top-down and then reverse back. This process will be repeated multiple times according to two dimensions, i.e., the height of the graph hierarchy and the depth of stacked layers. To realize the above scheme, we adopt two core elementary operations, graph full network (GFuN) and stridden edge contraction pooling (S-EdgePool), which are extended from conventional graph convolution and edge contraction pooling. We conduct comprehensive experiments on tens of public datasets, in which we observe consistent performance gains compared to baselines. Specifically, we establish five new graph theory benchmark datasets that require long-term inference and deduction to solve. In these tasks, MeGraph demonstrates dominated performance compared with popular graph neural networks. Our contributions can be summarized as follows. 1) We propose a novel mega graph structure with general usage for graph neural networks. Given the mega graph, we propose a specific network module to enable repeated information exchange across multi-scale graphs. 2) To control the scale of pooled graphs, we design the S-EdgePool operator, which allows variable pooling stride and pooling ratio. 3) We create five new graph theory benchmark tasks, including problems of shortest path, maximum connected component, graph diameter, etc. The MeGraph model achieves obvious improvement on most of the benchmarks compared to popular GNNs.

2. NOTATIONS, BACKGROUNDS AND PRELIMINARIES

Let G = (V, E) be a graph with node set V (of cardinality N v ) and edge set E (of cardinality N e ). The edge set can be represented as E = {(s k , t k )} k=1:N e , where s k and t k are the indices of the source and target nodes connected by edge k. We define X G as features of graph G, which is a combination of global (graph-level) features u G , node features V G , and edge features E G . Accordingly, we use V G i to represent the features of a specific node v i , and E G k denotes the features of a specific edge (s k , t k ). We may abuse the notations by omitting the superscript G when there is no ambiguity from the contexts.

2.1. GRAPH NETWORK (GN) BLOCK

We follow the graph networks (GN) framework in (Battaglia et al., 2018) . Using our notations, a GN block takes a graph G and features X = (u, V, E) as inputs, and the block outputs new features X ′ = (u ′ , V ′ , E ′ ). A full GN block (Battaglia et al., 2018) contains the following computational steps (where ϕ in each step below indicates an update function that is usually a neural network): 1. Update edge features: E ′ k = ϕ e (E k , V s k , V t k , u), ∀k ∈ [1 . . . N e ].

2.. Update node features: V

′ i = ϕ v (ρ e→v ({E ′ k } k∈[1...N e ],t k =i ), V i , u), ∀i ∈ [1 . . . N v ] , where ρ e→v is an edge-to-node aggregation function taking the features of incoming edges as inputs.



Figure 1: Illustration for comparing the graph pyramid and the mega graph. The graph pyramid is formed with iterative graph pooling. Different shapes represent the nodes in different scales (heights). The inter-graph edges generated during graph pooling connect the graph pyramid into a complete mega graph.

