D 2 MATCH: LEVERAGING DEEP LEARNING AND DE-GENERACY FOR SUBGRAPH MATCHING

Abstract

Subgraph matching is a fundamental building block for many graph-based applications and is challenging due to its high-order combinatorial nature. However, previous methods usually tackle it by combinatorial optimization or representation learning and suffer from exponential computational cost or matching without theoretical guarantees. In this paper, we develop D 2 Match by leveraging the efficiency of Deep learning and Degeneracy for subgraph matching. More specifically, we prove that subgraph matching can degenerate to subtree matching, and subsequently is equivalent to finding a perfect matching on a bipartite graph. This matching procedure can be implemented by the built-in tree-structured aggregation mechanism on graph neural networks, which yields linear time complexity. Moreover, circle structures, abstracted as supernodes, and node attributes can be easily incorporated in D 2 Match to boost the matching. Finally, we conduct extensive experiments to show the superior performance of our D 2 Match and confirm that our D 2 Match indeed tries to exploit the subtrees and differs from existing learning-based subgraph matching methods that depend on memorizing the data distribution divergence.

1. INTRODUCTION

Graphs serve as a common language for modeling a wide range of applications (Georgousis et al., 2021) because of their superior performance in abstracting representations for complex structures. Notably, subgraph isomorphism is a critical yet particularly challenging graph-related task, a.k.a., subgraph matching at the node level (McCreesh et al., 2018) . Subgraph matching aims to determine whether a query graph is isomorphic to a subgraph of a large target graph. It is an essential building block for many applications, as it can be used for alignment (Chen et al., 2020 ), canonicalization (Zhou & Torre, 2009 ), motif matching (Milo et al., 2002; Peng et al., 2020) , etc. Previous work tries to resolve subgraph matching in two main streams, i.e., combinatorial optimization (CO)-based and learning-based methods (Vesselinova et al., 2020) . Early algorithms often formulate subgraph matching as a CO problem that aims to find all exact matches in a target graph. Unfortunately, this yields an NP-complete issue (Ullmann, 1976; Cordella et al., 2004) and suffers from exponential time cost. To alleviate the computational cost, researchers have employed approximate techniques to seek inexact solutions (Mongiovì et al., 2010; Yan et al., 2005; Shang et al., 2008) . An alternative solution is to frame subgraph matching as a machine learning problem (Bai et al., 2019; Rex et al., 2020; Bai et al., 2020) by computing the similarity of the learned representations at the node or graph levels from two graphs. Though learning-based models can attain a solution in polynomial time, they provide little theoretical guarantee, making the results suboptimal and lacking interpretability. If not worse, the learning-based methods often cannot obtain the exact match subgraphs. Ideally, we hope to develop a subgraph matching algorithm that can leverage the efficiency of learning methods while still maintaining theoretical guarantees. We approach this by building the connection between subgraph matching and perfect matching on a bipartite graph. We prove that finding the corresponding nodes between the query graph and the target one is equivalent to checking whether there is a perfect matching on the bipartite graphs generated by the nodes from the query graph and the target one recursely, yielding a much more efficient subgraph matching algorithm solved in polynomial time. 1

