ACCURATE LEARNING OF GRAPH REPRESENTATIONS WITH GRAPH MULTISET POOLING

Abstract

Graph neural networks have been widely used on modeling graph data, achieving impressive results on node classification and link prediction tasks. Yet, obtaining an accurate representation for a graph further requires a pooling function that maps a set of node representations into a compact form. A simple sum or average over all node representations considers all node features equally without consideration of their task relevance, and any structural dependencies among them. Recently proposed hierarchical graph pooling methods, on the other hand, may yield the same representation for two different graphs that are distinguished by the Weisfeiler-Lehman test, as they suboptimally preserve information from the node features. To tackle these limitations of existing graph pooling methods, we first formulate the graph pooling problem as a multiset encoding problem with auxiliary information about the graph structure, and propose a Graph Multiset Transformer (GMT) which is a multi-head attention based global pooling layer that captures the interaction between nodes according to their structural dependencies. We show that GMT satisfies both injectiveness and permutation invariance, such that it is at most as powerful as the Weisfeiler-Lehman graph isomorphism test. Moreover, our methods can be easily extended to the previous node clustering approaches for hierarchical graph pooling. Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks with high memory and time efficiency, and obtains even larger performance gain on graph reconstruction and generation tasks.

1. INTRODUCTION

Graph neural networks (GNNs) (Zhou et al., 2018; Wu et al., 2019) , which work with graph structured data, have recently attracted considerable attention, as they can learn expressive representations for various graph-related tasks such as node classification, link prediction, and graph classification. While the majority of the existing works on GNNs focus on the message passing strategies for neighborhood aggregation (Kipf & Welling, 2017; Hamilton et al., 2017) , which aims to encode the nodes in a graph accurately, graph pooling (Zhang et al., 2018; Ying et al., 2018) that maps the set of nodes into a compact representation is crucial in capturing a meaningful structure of an entire graph. As a simplest approach for graph pooling, we can average or sum all node features in the given graph (Atwood & Towsley, 2016; Xu et al., 2019) (Figure 1 (B)). However, since such simple aggregation schemes treat all nodes equally without considering their relative importance on the given tasks, they can not generate a meaningful graph representation in a task-specific manner. Their flat architecture designs also restrict their capability toward the hierarchical pooling or graph compression into few nodes. To tackle these limitations, several differentiable pooling operations have been proposed to condense the given graph. There are two dominant approaches to pooling a graph. Node drop methods (Zhang et al., 2018; Lee et al., 2019b) (Figure 1 (C )) obtain a score of each node using information from graph convolutional layers, and then drop unnecessary nodes with lower scores at each pooling step. Node clustering methods (Ying et al., 2018; Bianchi et al., 2019) (Figure 1 (D) ), on the other hand, cluster similar nodes into a single node by exploiting their hierarchical structure. To obtain accurate representations of graphs, we need a graph pooling function that is as powerful as the WL test in distinguishing two different graphs. To this end, we first focus on that the graph representation learning can be regarded as a multiset encoding problem, which allows for possibly repeating elements, since a graph may have redundant node representations (See Figure 1 , right). However, since a graph is more than a multiset due to its structural constraint, we further define the problem as a graph multiset encoding, whose goal is to encode two different graphs, given as multisets of node features with auxiliary structural dependencies among them (See Figure 1 , right), into two unique embeddings. We tackle this problem by utilizing a graph-structured attention unit. By leveraging this unit as a fundamental building block, we propose the Graph Multiset Transformer (GMT), a pooling mechanism that condenses the given graph into the set of representative nodes, and then further encodes relationships between them to enhance the representation power of a graph. We theoretically analyze the connection between our pooling operations and WL test, and further show that our graph multiset pooling function can be easily extended to node clustering methods. We then experimentally validate the graph classification performance of GMT on 10 benchmark datasets from biochemical and social domains, on which it significantly outperforms existing methods on most of them. However, since graph classification tasks only require discriminative information, to better quantify the amount of information about the graph in condensed nodes after pooling, we further validate it on graph reconstruction of synthetic and molecule graphs, and also on two graph generation tasks, namely molecule generation and retrosynthesis. Notably, GMT outperforms baselines with even larger performance gap on graph reconstruction, which demonstrates that it learns meaningful information without forgetting original graph structure. Finally, it improves the graph generation performance on two tasks, which shows that GMT can be well coupled with other GNNs for graph representation learning. In sum, our main contributions are summarized as follows: • We treat a graph pooling problem as a multiset encoding problem, under which we consider relationships among nodes in a set with several attention units, to make a compact representation of an entire graph only with one global function, without additional message-passing operations. • We show that existing GNN with our parametric pooling operation can be as powerful as the WL test, and also be easily extended to the node clustering approaches with learnable clusters. • We extensively validate GMT for graph classification, reconstruction, and generation tasks on synthetic and real-world graphs, on which it largely outperforms most graph pooling baselines.



Figure 1: Concepts (Left): Conceptual comparison of graph pooling methods. Grey box indicates the readout layer, which is compatible with our method. Also, green check icon indicates the model that can be as powerful as the WL test. (Right): An illustration of set, multiset, and graph multiset encoding for graph representation.Both graph pooling approaches have obvious drawbacks. First, node drop methods unnecessarily drop some nodes at every pooling step, leading to information loss on those discarded nodes. On the other hand, node clustering methods compute the dense cluster assignment matrix with an adjacency matrix. This prevents them from exploiting sparsity in the graph topology, leading to excessively high computational complexity(Lee et al., 2019b). Furthermore, to accurately represent the graph, the GNNs should obtain a representation that is as powerful as the Weisfeiler-Lehman (WL) graph isomorphism test(Weisfeiler & Leman, 1968), such that it can map two different graphs onto two distinct embeddings. While recent message-passing operations satisfy this constraint(Morris et al.,  2019; Xu et al., 2019), most deep graph pooling works(Ying et al., 2018; Lee et al., 2019b; Gao &  Ji, 2019; Bianchi et al., 2019)  overlook graph isomorphism except for a few(Zhang et al., 2018).

