GRAPH INFORMATION BOTTLENECK FOR SUBGRAPH RECOGNITION

Abstract

Given the input graph and its label/property, several key problems of graph learning, such as finding interpretable subgraphs, graph denoising and graph compression, can be attributed to the fundamental problem of recognizing a subgraph of the original one. This subgraph shall be as informative as possible, yet contains less redundant and noisy structure. This problem setting is closely related to the well-known information bottleneck (IB) principle, which, however, has less been studied for the irregular graph data and graph neural networks (GNNs). In this paper, we propose a framework of Graph Information Bottleneck (GIB) for the subgraph recognition problem in deep graph learning. Under this framework, one can recognize the maximally informative yet compressive subgraph, named IBsubgraph. However, the GIB objective is notoriously hard to optimize, mostly due to the intractability of the mutual information of irregular graph data and the unstable optimization process. In order to tackle these challenges, we propose: i) a GIB objective based-on a mutual information estimator for the irregular graph data; ii) a bi-level optimization scheme to maximize the GIB objective; iii) a connectivity loss to stabilize the optimization process. We evaluate the properties of the IB-subgraph in three application scenarios: improvement of graph classification, graph interpretation and graph denoising. Extensive experiments demonstrate that the information-theoretic IB-subgraph enjoys superior graph properties.

1. INTRODUCTION

Classifying the underlying labels or properties of graphs is a fundamental problem in deep graph learning with applications across many fields, such as biochemistry and social network analysis. However, real world graphs are likely to contain redundant even noisy information (Franceschi et al., 2019; Yu et al., 2019) , which poses a huge negative impact for graph classification. This triggers an interesting problem of recognizing an informative yet compressed subgraph from the original graph. For example, in drug discovery, when viewing molecules as graphs with atoms as nodes and chemical bonds as edges, biochemists are interested in identifying the subgraphs that mostly represent certain properties of the molecules, namely the functional groups (Jin et al., 2020b; Gilmer et al., 2017) . In graph representation learning, the predictive subgraph highlights the vital substructure for graph classification, and provides an alternative way for yielding graph representation besides mean/sum aggregation (Kipf & Welling, 2017; Velickovic et al., 2017; Xu et al., 2019) and pooling aggregation (Ying et al., 2018; Lee et al., 2019; Bianchi et al., 2020) . In graph attack and defense, it is vital to purify a perturbed graph and mine the robust structures for classification (Jin et al., 2020a) . Recently, the mechanism of self-attentive aggregation (Li et al., 2019) somehow discovers a vital substructure at node level with a well-selected threshold. However, this method only identifies isolated important nodes but ignores the topological information at subgraph level. Consequently, it leads to a novel challenge as subgraph recognition: How can we recognize a compressed subgraph with minimum information loss in terms of predicting the graph labels/properties? Recalling the above challenge, there is a similar problem setting in information theory called information bottleneck (IB) principle (Tishby et al., 1999) , which aims to juice out a compressed data from the original data that keeps most predictive information of labels or properties. Enhanced with deep learning, IB can learn informative representation from regular data in the fields of computer vision (Peng et al., 2019; Alemi et al., 2017; Luo et al., 2019) , reinforcement learning (Goyal et al., 2019; Igl et al., 2019) and natural language precessing (Wang et al., 2020) . However, current IB methods, like VIB (Alemi et al., 2017) , is still incapable for irregular graph data. It is still challenging for IB to compress irregular graph data, like a subgraph from an original graph, with a minimum information loss. Hence, we advance the IB principle for irregular graph data to resolve the proposed subgraph recognition problem, which leads to a novel principle, Graph Information Bottleneck (GIB). Different from prior researches in IB that aims to learn an optimal representation of the input data in the hidden space, GIB directly reveals the vital substructure in the subgraph level. We first i) leverage the mutual information estimator from Deep Variational Information Bottleneck (VIB) (Alemi et al., 2017) for irregular graph data as the GIB objective. However, VIB is intractable to compute the mutual information without knowing the distribution forms, especially on graph data. To tackle this issue, ii) we adopt a bi-level optimization scheme to maximize the GIB objective. Meanwhile, the continuous relaxation that we adopt to approach the discrete selection of subgraph will lead to unstable optimization process. To further stabilize the training process and encourage a compact subgraph, iii) we propose a novel connectivity loss to assist GIB to effectively discover the maximally informative but compressed subgraph, which is defined as IB-subgraph. By optimizing the above GIB objective and connectivity loss, one can recognize the IB-subgraph without any explicit subgraph annotation. On the other hand, iv) GIB is model-agnostic and can be easily plugged into various Graph Neural Networks (GNNs). We evaluate the properties of the IB-subgraph in three application scenarios: improvement of graph classification, graph interpretation, and graph denoising. Extensive experiments on both synthetic and real world datasets demonstrate that the information-theoretic IB-subgraph enjoys superior graph properties compared to the subgraphs found by SOTA baselines.

2. RELATED WORK

Graph Classification. In recent literature, there is a surge of interest in adopting graph neural networks (GNN) in graph classification. The core idea is to aggregate all the node information for graph representation. A typical implementation is the mean/sum aggregation (Kipf & Welling, 2017; Xu et al., 2019) , which is to average or sum up the node embeddings. An alternative way is to leverage the hierarchical structure of graphs, which leads to the pooling aggregation (Ying et al., 2018; Zhang et al., 2018; Lee et al., 2019; Bianchi et al., 2020) . When tackling with the redundant and noisy graphs, these approaches will likely to result in sub-optimal graph representation. Recently, InfoGraph (Sun et al., 2019) maximize the mutual information between graph representations and multi-level local representations to obtain more informative global representations. Information Bottleneck. Information bottleneck (IB), originally proposed for signal processing, attempts to find a short code of the input signal but preserve maximum information of the code (Tishby et al., 1999 ). (Alemi et al., 2017) firstly bridges the gap between IB and the deep learning, and proposed variational information bottleneck (VIB). Nowadays, IB and VIB have been wildly employed in computer vision (Peng et al., 2019; Luo et al., 2019 ), reinforcement learning (Goyal et al., 2019; Igl et al., 2019) , natural language processing (Wang et al., 2020) and speech and acoustics (Qian et al., 2020) due to the capability of learning compact and meaningful representations. However, IB is less researched on irregular graphs due to the intractability of mutual information. Subgraph Discovery. Traditional subgraph discovery includes dense subgraph discovery and frequent subgraph mining. Dense subgraph discovery aims to find the subgraph with the highest density (e.g. the number of edges over the number of nodes (Fang et al., 2019; Gionis & Tsourakakis, 2015) ). Frequent subgraph mining is to look for the most common substructure among graphs (Yan & Yan, 2002; Ketkar et al., 2005; Zaki, 2005) . At node-level, researchers discover the vital substructure

