D 2 MATCH: LEVERAGING DEEP LEARNING AND DE-GENERACY FOR SUBGRAPH MATCHING

Abstract

Subgraph matching is a fundamental building block for many graph-based applications and is challenging due to its high-order combinatorial nature. However, previous methods usually tackle it by combinatorial optimization or representation learning and suffer from exponential computational cost or matching without theoretical guarantees. In this paper, we develop D 2 Match by leveraging the efficiency of Deep learning and Degeneracy for subgraph matching. More specifically, we prove that subgraph matching can degenerate to subtree matching, and subsequently is equivalent to finding a perfect matching on a bipartite graph. This matching procedure can be implemented by the built-in tree-structured aggregation mechanism on graph neural networks, which yields linear time complexity. Moreover, circle structures, abstracted as supernodes, and node attributes can be easily incorporated in D 2 Match to boost the matching. Finally, we conduct extensive experiments to show the superior performance of our D 2 Match and confirm that our D 2 Match indeed tries to exploit the subtrees and differs from existing learning-based subgraph matching methods that depend on memorizing the data distribution divergence.

1. INTRODUCTION

Graphs serve as a common language for modeling a wide range of applications (Georgousis et al., 2021) because of their superior performance in abstracting representations for complex structures. Notably, subgraph isomorphism is a critical yet particularly challenging graph-related task, a.k.a., subgraph matching at the node level (McCreesh et al., 2018) . Subgraph matching aims to determine whether a query graph is isomorphic to a subgraph of a large target graph. It is an essential building block for many applications, as it can be used for alignment (Chen et al., 2020) , canonicalization (Zhou & Torre, 2009 ), motif matching (Milo et al., 2002; Peng et al., 2020), etc. Previous work tries to resolve subgraph matching in two main streams, i.e., combinatorial optimization (CO)-based and learning-based methods (Vesselinova et al., 2020) . Early algorithms often formulate subgraph matching as a CO problem that aims to find all exact matches in a target graph. Unfortunately, this yields an NP-complete issue (Ullmann, 1976; Cordella et al., 2004) and suffers from exponential time cost. To alleviate the computational cost, researchers have employed approximate techniques to seek inexact solutions (Mongiovì et al., 2010; Yan et al., 2005; Shang et al., 2008) . An alternative solution is to frame subgraph matching as a machine learning problem (Bai et al., 2019; Rex et al., 2020; Bai et al., 2020) by computing the similarity of the learned representations at the node or graph levels from two graphs. Though learning-based models can attain a solution in polynomial time, they provide little theoretical guarantee, making the results suboptimal and lacking interpretability. If not worse, the learning-based methods often cannot obtain the exact match subgraphs. Ideally, we hope to develop a subgraph matching algorithm that can leverage the efficiency of learning methods while still maintaining theoretical guarantees. We approach this by building the connection between subgraph matching and perfect matching on a bipartite graph. We prove that finding the corresponding nodes between the query graph and the target one is equivalent to checking whether there is a perfect matching on the bipartite graphs generated by the nodes from the query graph and the target one recursely, yielding a much more efficient subgraph matching algorithm solved in polynomial time. This degeneracy allows us to harness the power of Graph Neural Networks (GNNs) to fulfill the matching by deploying a built-in tree-structured aggregation mechanism in GNNs. Operating on node-level correspondences offers a node matching matrix, which allows us to locate the matched subgraph directly. To incorporate more information, we augment the bipartite graph with supernodes, which wraps the circles, into the perfect matching procedure; see Fig. 1 for an illustration of the basic idea. Moreover, node attributes can be easily included accordingly. Our primary contribution is three-fold: (1) D 2 Match proposes a novel learning-based subgraph matching method, which frames the subgraph matching problem as perfect matching on a bipartite graph. (2) We theoretically prove that this matching procedure can be implemented by the builtin tree structured aggregation mechanism on GNNs and yields linear time complexity. Moreover, we can easily incorporate circle structures, abstracted as supernodes, and node attributes into our D 2 Match to boost the performance. (3) Extensive empirical evaluations show that D 2 Match outperforms state-of-the-art subgraph matching methods by a substantial margin and uncover that learningbased methods tend to capture the data distribution divergence rather than performing matching. 𝑏 ′ 𝑐 ′ 𝑑 ′ 𝑟 ′ 𝑏 1 0 0 0 𝑐 1 1 0 0 𝑑 0 0 1 0 𝑟 0 0 0 1 𝑻 𝒂 ′ 𝒍+𝟏 𝑎 ′ 𝑟 ′ 𝑎 ′ 𝑎 ′ 𝑎 ′ 𝑎 ′ 𝑏 ′ 𝑏 ′ 𝑏 ′ 𝑐′ 𝑐′ 𝑐′ 𝑑 ′ 𝑎 ′ 𝑟 ′ 𝑎 ′ 𝑎 ′ 𝑎 ′ 𝑎 ′ 𝑏 ′ 𝑏 ′ 𝑏 ′ 𝑐′ 𝑐′ 𝑐′ 𝑑 ′

2. RELATED WORK

Subgraph matching is to check whether a query graph is subgraph isomorphic to the target one (Mc-Creesh et al., 2018) . Here, we highlight three main lines of related work: Combinatorial optimization (CO)-based methods first tackle subgraph matching by only modeling graph structure (Ullmann, 1976) . Recent work starts to facilitate both graph structure and node attributes (Han et al., 2013; Shang et al., 2008) . These combinatorial optimization methods often rely on backtracking (Priestley & Ward, 1994) , i.e., heuristically performing matching on each pair of nodes from the query and the target graphs. Such methods suffer from exponential computing costs. A mitigated solution is to employ an inexact matching strategy. Early methods first define metrics to measure the similarity between the query graph and the target graph. Successive algorithms follow this strategy and propose more complex metrics. For example, Mongiovì et al. ( 2010) convert the graph matching problem into a set-cover problem to attain a polynomial complexity solution. Yan et al. (2005) introduce a thresholding method to filter out mismatched graphs. Khan et al. (2011) define a metric based on neighborhood similarity and employ an information propagation model to find similar graphs. Kosinov & Caelli (2002) and Caelli & Kosinov (2004) align the nodes' eigenspace and project them to the eigenspace via clustering for matching. However, most of these algorithms cannot scale to large graphs due to the high computational cost, and their hand-crafted features make them hard to generalize to complex tasks. Learning-based methods typically compute the similarity between the query and target graphs, e.g., comparing their embedding vectors. Bai et al. (2019) adopt GNNs to learn node representations of the query and target graphs, which employs a neural tensor network to match the graph pairs. One immediate challenge is that a single graph embedding vector cannot capture the partial order information for subgraph matching. Thus, Rex et al. ( 2020) train a GNN model to represent



Figure 1: An illustration of the proposed degeneracy procedure for subgraph matching. Step (1) & (2) are to determine the isomorphism of a pair (a, a ′ ) by examining whether their corresponding subtrees, i.e., T (l+1) a

