SEEDGNN: GRAPH NEURAL NETWORK FOR SUPER-VISED SEEDED GRAPH MATCHING

Abstract

There have been significant interests in designing Graph Neural Networks (GNNs) for seeded graph matching, which aims to match two (unlabeled) graphs using only topological information and a small set of seeds. However, most previous GNNs for seeded graph matching employ a semi-supervised approach, which requires a large number of seeds and can not learn knowledge transferable to unseen graphs. In contrast, this paper proposes a new supervised approach that can learn from a training set how to match unseen graphs with only a few seeds. At the core of our SeedGNN architecture are two novel modules: 1) a convolution module that can easily learn the capability of counting and using witnesses of different hops; 2) a percolation module that can use easily-matched pairs as new seeds to percolate and match other nodes. We evaluate SeedGNN on both synthetic and real graphs, and demonstrate significant performance improvement over both non-learning and learning algorithms in the existing literature. Further, our experiments confirm that the knowledge learned by SeedGNN from training graphs can be generalized to test graphs with different sizes and categories.

1. INTRODUCTION

Graph matching, also known as network alignment, aims to find the node correspondence between two graphs that maximally aligns their edge sets. As a ubiquitous but challenging problem, graph matching has numerous applications, including social network analysis (Narayanan et al., 2008; 2009; Zafarani et al., 2015; Zhang et al., 2015b; a; Chiasserini et al., 2016 ), computer vision (Conte et al., 2004; Schellewald et al., 2005; Vento et al., 2013) , natural language processing (Haghighi et al., 2005) , and computational biology (Singh et al., 2008; Kazemi et al., 2016; Kriege et al., 2019) . This paper focuses on seeded graph matching, where a small portion of the node correspondence between the two graphs is revealed as seeds, and we seek to complete the correspondence by growing from the few seeded node pairs. Seeded graph matching is motivated by the fact that, in many real applications, the correspondence between a small portion of the two node sets is naturally available. For example, in social network de-anonymization, some users who explicitly link their accounts across different social networks could become seeds (Narayanan et al., 2008; 2009) . Knowledge of even a few seeds has been shown to significantly improve the matching results for many real-world graphs (Kazemi et al., 2015; Fishkind et al., 2019) . Recently, the Graph Neural Network (GNN) approach for graph matching has attracted much research attention. Although such a machine-learning-based approach usually does not possess provable theoretical guarantees, it has the potential to learn valuable features from a large set of training data. Unfortunately, to date GNN has not been successfully applied to seeded graph matching. Most previous GNNs for seeded graph matching are limited to a semi-supervised learning paradigm, which only operates on a single pair of graphs (Zhang et al., 2019; Li et al., 2019a; b; c; Zhou et al., 2019; Chen et al., 2020; Derr et al., 2021) and treats the seed set as the labelled training data. The goal is to learn the useful features from the seed set, and then to generalize the knowledge to the rest of the unseeded nodes. This semi-supervised learning, however, suffers from two major limitations. First, in order to obtain high matching accuracy, the set of seeds needs to be sufficiently large, which is often unrealistic in practice. Second, as this semi-supervised setting only learns within a given pair of graphs, there is no effort in transferring knowledge from one pair of graphs to other pairs of unseen graphs, which severely limits GNNs' potential in distilling the common knowledge from a large set of training graphs. A natural but fundamental question is 1

