SPECTRAL SUBGRAPH LOCALIZATION

Abstract

Several graph mining problems are based on some variant of the subgraph isomorphism problem: Given two graphs, G and Q, does G contain a subgraph isomorphic to Q? As this problem is NP-complete, many methods avoid addressing it explicitly. In this paper, we propose a method that solves the problem by localizing, i.e., finding the position of, Q in G, by means of an alignment among graph spectra. Finding a node correspondence from Q to G thereafter is relegated to a separate task, as an instance of the graph alignment problem. We demonstrate that our spectral approach outperforms a baseline based on the state-of-the-art method for graph alignment in terms of accuracy on real graphs and scales to hundreds of nodes as no other method does.

1. INTRODUCTION

Graph analysis tasks frequently require localizing a smaller target graph Q within a larger source graph G, i.e., finding a subgraph of G that is best aligned with Q. This type of problem may appear as subgraph discovery (Kuramochi & Karypis, 2001; Bianchini et al., 2018) , where we need to find any target graph in G, in subgraph querying (Katsarou et al., 2015; Sun & Luo, 2019) , where we find out whether a target subgraph match exists within a collection of source graphs, or graph matching (Zhang & Tong, 2016) , where we have to align corresponding nodes across two graphs, potentially of different sizes. Such subgraph localization is of interest in practical applications such as localizing a smaller electronic circuit within a large circuit (Fyrbiak et al., 2019) , detecting submolecules in bigger molecules (Najmanovich et al., 2008) , and localizing parts of shapes in computational geometry (Rampini et al., 2019) . For instance, the task of subcircuit detection (Fyrbiak et al., 2019) involves sampling multiple subgraphs and comparing the spectra of their adjacency matrices to that of the query subgraph. Despite the prevalence of the problem, current research has avoided tackling it directly, due to its NP-hardness. 1 0 2 0 3 0 4 1 5 1 6 1 7 1 8 0 δ(v) 1 3 2 4 Q 1 3 2 4 5 6 7 8 G 1 2 3 4 5 6 7 8 Figure 1 : An instance of subgraph localization (left) and its solution (right). In this paper, we propose a novel spectral solution to the problem of subgraph localization, built around the notion of identifying the spectrum λ Q of a graph Q within that of another graph G. Figure 1 visualizes an instance of the subgraph localization problem by our formulation; we aim to find a function δ that indicates which nodes in G correspond to Q. Our solution effectively recovers both the nodes belonging to the part and the edges that connect the part to the rest of the graph. This problem is an instance of inverse eigenvalues problems (Chu & Golub, 2005) , the class of problems which aim to reconstruct a matrix from its spectrum. Our experimental study demonstrates that our approach tackles the subgraph localization problem more effectively than state-of-the-art neural competitors and showcases its applicability to the real world problem of subgraph alignment. In summary, our contributions are as follows: • We propose a spectral formulation for the subgraph localization problem (Sec. 4). • We show that our solution achieves the optimum value under mild conditions (Sec. 3). • We experimentally validate the effectiveness of our solution on real and synthetic graphs (Sec. 5).

2. RELATED WORK

We review related work on five problems related to subgraph localization, namely subgraph isomorphism, subgraph discovery, subgraph querying, subgraph matching, and subgraph localization. The subgraph isomorphism problem is to decide whether a source graph contains a target subgraph and return that exact subgraph in the source. In graph analytics, this problem is mainly solved for very small target subgraphs (≤ 10 nodes) and aims at exact matches. Several methods speed up this process by exploiting query specifics, such as patterns in multiple subgraph queries (Duong et al., 2021) . By contrast, our method aims at bigger target subgraphs. In subgraph discovery, a target subgraph is not given as input, yet the problem is to identify interesting components of a source graph according to some criteria, as, e.g., those that appear frequently (Kuramochi & Karypis, 2001; 2004) 

3. SUBGRAPH LOCALIZATION

All aforementioned problems have in common the search for one graph within another. We study the most generic form of this problem, which corresponds to the problem named Subgraph localization in our previous discussion. That is, we aim to identify a subset of the nodes of a graph G corresponding to an input graph Q; we do not aim at an exact 1-to-1 correspondence among all graph elements, but to simply detect a set of best matches. Problem 1. The subgraph localization problem for a graph G = ⟨V, E⟩, where V is a set of n nodes and E ⊆ V × V is a set of edges, and a query graph Q = ⟨V Q , E Q ⟩ with n Q = |V Q |, n Q < n, calls to find a set of nodes V S ⊂ V , inducing a set of edges E S ⊂ E, such that |V S | = |V Q | and there exists a bijective function f : V S → V Q between the nodes in V S and those in V Q such that for each (i, j) ∈ E S there exists (f (i), f (j)) ∈ E Q and vice versa. In many applications, solving subgraph localization, we do not need to explicitly materialize the correspondence function f . Such a one-to-one correspondence is not explicitly sought for. Thus, we can eschew recovering an exact f and instead aim at finding an indicator function δ : V → {0, 1} such that δ(v) = 1, if v ∈ V Q , and δ(v) = 0 otherwise. At first glance, finding such an indicator function seems easier than recovering a bijective function f . However, even in this identity-function formulation, the problem corresponds to the decision version of the subgraph isomorphism problem, which asks whether a graph G contains a subgraph



, achieve a density threshold(Lee et al., 2010; Qin et al.,  2015), or form cliques(Bianchini et al., 2018).In subgraph querying, the goal is to identify all source graphs among a collection that contain a query target subgraph, without necessarily indicating the position of that subgraph within the returned graphs(Katsarou et al., 2015; Sun & Luo, 2019; Sun et al., 2020). A closely related topic is subgraph retrieval, where the goal is to retrieve the most relevant graphs from a graph database, with relevance being measured by some score. In Roy et al. (2022) node embeddings are learned in order to produce a subgraph matching for the computation of the relevance score. In Li et al. (2019), nodes are matched in order to produce a graph similarity score without producing node embeddings as an intermediate step. In both cases, the queries are significantly smaller than ours.The goal of subgraph matching is to match the nodes of a smaller graph to those of a subgraph in a bigger graph via minimizing some error criteria, possibly in the presence of available attribute information. Many methods for graph matching effectively solve a subgraph isomorphism problem, even though they are not specifically designed for this purpose(Zhang & Tong, 2016). Recent work(Lou et al., 2020; Li et al., 2019)  employs deep neural models to learn node embeddings that are subsequently used for matching.The problem of subgraph localization calls to detect a good fit, by some measure(Skitsas et al.,  2023)  of a target subgraph within a bigger source graph, without aiming for full isomorphism. This problem has been scarcely studied. A recent application in computer vision(Xu et al., 2020)  uses subgraph localization to detect temporal actions, where a graph models actions and the temporal relations between them. However, this model uses edges for temporal aspects and inter-scene relations, and hence does not generalize to arbitrary graphs. An existing spectral solution(Candogan &  Chandrasekaran, 2018)  is limited to special families of graphs, such as cliques.

