OPTIMAL TRANSPORT-BASED SUPERVISED GRAPH SUMMARIZATION

Abstract

Graph summarization is the problem of producing smaller graph representations of an input graph dataset, in such a way that the smaller compressed graphs capture relevant structural information for downstream tasks. One graph summarization method, recently proposed in Garg & Jaakkola (2019), formulates an optimal transport-based framework that allows prior information about node, edge, and attribute importance to be incorporated into the graph summarization process. We consider the problem of graph summarization in a supervised setting, wherein we seek to preserve relevant information about a class label. We first formulate this problem in terms of maximizing the Shannon mutual information between the summarized graph and the class label. We propose a method that incorporates mutual information estimates between random variables associated with sample graphs and class labels into the optimal transport compression framework. We empirically show performance improvements over previous works in terms of classification accuracy and time on synthetic and certain real datasets. We also theoretically explore the limitations of the optimal transport approach for the supervised summarization problem and we show that it fails to satisfy a certain desirable information monotonicity property.

1. INTRODUCTION

Machine learning involving graphs has a wide range of applications in artificial intelligence Scarselli et al. (2008) ; Dessì et al. (2020) , network analysis, and biological interactions Han et al. (2019); Chen et al. (2020) . Graph classification problems use the network structure of the underlying data to improve predictive decision outcomes. However, graph datasets are often enormous, and the algorithms used to extract relevant information from graphs are frequently computationally expensive. Graph summarization addresses these scalability issues by computing reduced representations of graph datasets while retaining relevant information. As with numerous other problems in machine learning, the precise meaning of "reduced representation" does not have one single mathematical definition, and there is no single objective function being optimized. There are thus various approaches to this problem. For a survey, see Liu et al. (2018) . The particular type of approach of interest in this paper takes a dataset of graphs and a number k as input and outputs, for each graph G in the dataset, a subgraph H ⊆ G is induced by k vertices. Optimal transport, the general problem of moving one distribution of mass to another as efficiently as possible, has been used in many recent graph-related problems, such as graph matching via the Gromov-Wasserstein distance Xu et al. (2019) . One recent approach to the graph summarization problem that allows for the incorporation of user-engineered prior information is the Optimal Transport based Compression (OTC) approach of Garg & Jaakkola (2019). Their approach is as follows: a graph G, a target number k of vertices, a probability distribution ρ 0 on the vertices of G, and a cost function c : E(G) → R, where E(G) denotes the set of edges of G, are given as input. A probability distribution ρ 1 is computed by minimizing the Wasserstein distance on G between ρ 0 and ρ 1 with respect to the cost function c, with the constraint, that the number of vertices in the support of ρ 1 is at most k. The output subgraph H is the one induced by the vertices in the support of ρ 1 . Prior information can be incorporated into the method via appropriately choosing ρ 0 and c, but in the prior work, this "prior information" is not learned, and ρ 0 and c are set heuristically.

