ACCURATE LEARNING OF GRAPH REPRESENTATIONS WITH GRAPH MULTISET POOLING

Abstract

Graph neural networks have been widely used on modeling graph data, achieving impressive results on node classification and link prediction tasks. Yet, obtaining an accurate representation for a graph further requires a pooling function that maps a set of node representations into a compact form. A simple sum or average over all node representations considers all node features equally without consideration of their task relevance, and any structural dependencies among them. Recently proposed hierarchical graph pooling methods, on the other hand, may yield the same representation for two different graphs that are distinguished by the Weisfeiler-Lehman test, as they suboptimally preserve information from the node features. To tackle these limitations of existing graph pooling methods, we first formulate the graph pooling problem as a multiset encoding problem with auxiliary information about the graph structure, and propose a Graph Multiset Transformer (GMT) which is a multi-head attention based global pooling layer that captures the interaction between nodes according to their structural dependencies. We show that GMT satisfies both injectiveness and permutation invariance, such that it is at most as powerful as the Weisfeiler-Lehman graph isomorphism test. Moreover, our methods can be easily extended to the previous node clustering approaches for hierarchical graph pooling. Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks with high memory and time efficiency, and obtains even larger performance gain on graph reconstruction and generation tasks.

1. INTRODUCTION

Graph neural networks (GNNs) (Zhou et al., 2018; Wu et al., 2019) , which work with graph structured data, have recently attracted considerable attention, as they can learn expressive representations for various graph-related tasks such as node classification, link prediction, and graph classification. While the majority of the existing works on GNNs focus on the message passing strategies for neighborhood aggregation (Kipf & Welling, 2017; Hamilton et al., 2017) , which aims to encode the nodes in a graph accurately, graph pooling (Zhang et al., 2018; Ying et al., 2018) that maps the set of nodes into a compact representation is crucial in capturing a meaningful structure of an entire graph. As a simplest approach for graph pooling, we can average or sum all node features in the given graph (Atwood & Towsley, 2016; Xu et al., 2019) (Figure 1 (B) ). However, since such simple aggregation schemes treat all nodes equally without considering their relative importance on the given tasks, they can not generate a meaningful graph representation in a task-specific manner. Their flat architecture designs also restrict their capability toward the hierarchical pooling or graph compression into few nodes. To tackle these limitations, several differentiable pooling operations have been proposed to condense the given graph. There are two dominant approaches to pooling a graph. Node drop methods (Zhang et al., 2018; Lee et al., 2019b) (Figure 1 (C )) obtain a score of each node using information from graph convolutional layers, and then drop unnecessary nodes with lower scores at each pooling step. Node clustering methods (Ying et al., 2018; Bianchi et al., 2019) (Figure 1 (D)), on the other hand, cluster similar nodes into a single node by exploiting their hierarchical structure.

