SPECTRAL SUBGRAPH LOCALIZATION

Abstract

Several graph mining problems are based on some variant of the subgraph isomorphism problem: Given two graphs, G and Q, does G contain a subgraph isomorphic to Q? As this problem is NP-complete, many methods avoid addressing it explicitly. In this paper, we propose a method that solves the problem by localizing, i.e., finding the position of, Q in G, by means of an alignment among graph spectra. Finding a node correspondence from Q to G thereafter is relegated to a separate task, as an instance of the graph alignment problem. We demonstrate that our spectral approach outperforms a baseline based on the state-of-the-art method for graph alignment in terms of accuracy on real graphs and scales to hundreds of nodes as no other method does.

1. INTRODUCTION

Graph analysis tasks frequently require localizing a smaller target graph Q within a larger source graph G, i.e., finding a subgraph of G that is best aligned with Q. This type of problem may appear as subgraph discovery (Kuramochi & Karypis, 2001; Bianchini et al., 2018) , where we need to find any target graph in G, in subgraph querying (Katsarou et al., 2015; Sun & Luo, 2019) , where we find out whether a target subgraph match exists within a collection of source graphs, or graph matching (Zhang & Tong, 2016) , where we have to align corresponding nodes across two graphs, potentially of different sizes. Such subgraph localization is of interest in practical applications such as localizing a smaller electronic circuit within a large circuit (Fyrbiak et al., 2019) , detecting submolecules in bigger molecules (Najmanovich et al., 2008) , and localizing parts of shapes in computational geometry (Rampini et al., 2019) . For instance, the task of subcircuit detection (Fyrbiak et al., 2019) involves sampling multiple subgraphs and comparing the spectra of their adjacency matrices to that of the query subgraph. Despite the prevalence of the problem, current research has avoided tackling it directly, due to its NP-hardness. In this paper, we propose a novel spectral solution to the problem of subgraph localization, built around the notion of identifying the spectrum λ Q of a graph Q within that of another graph G. Figure 1 visualizes an instance of the subgraph localization problem by our formulation; we aim to find a function δ that indicates which nodes in G correspond to Q. Our solution effectively recovers both the nodes belonging to the part and the edges that connect the part to the rest of the graph. This problem is an instance of inverse eigenvalues problems (Chu & Golub, 2005) , the class of problems which aim to reconstruct a matrix from its spectrum. Our experimental study demonstrates that our approach tackles the subgraph localization problem more effectively than state-of-the-art neural competitors and showcases its applicability to the real world problem of subgraph alignment. In summary, our contributions are as follows: • We propose a spectral formulation for the subgraph localization problem (Sec. 4). • We show that our solution achieves the optimum value under mild conditions (Sec. 3). • We experimentally validate the effectiveness of our solution on real and synthetic graphs (Sec. 5).

2. RELATED WORK

We review related work on five problems related to subgraph localization, namely subgraph isomorphism, subgraph discovery, subgraph querying, subgraph matching, and subgraph localization. The subgraph isomorphism problem is to decide whether a source graph contains a target subgraph and return that exact subgraph in the source. In graph analytics, this problem is mainly solved for very small target subgraphs (≤ 10 nodes) and aims at exact matches. Several methods speed up this process by exploiting query specifics, such as patterns in multiple subgraph queries (Duong et al., 2021) . By contrast, our method aims at bigger target subgraphs. In subgraph discovery, a target subgraph is not given as input, yet the problem is to identify interesting components of a source graph according to some criteria, as, e.g., those that appear frequently (Kuramochi & Karypis, 2001; 2004) , achieve a density threshold (Lee et al., 2010; Qin et al., 2015) , or form cliques (Bianchini et al., 2018) . In subgraph querying, the goal is to identify all source graphs among a collection that contain a query target subgraph, without necessarily indicating the position of that subgraph within the returned graphs (Katsarou et al., 2015; Sun & Luo, 2019; Sun et al., 2020) . A closely related topic is subgraph retrieval, where the goal is to retrieve the most relevant graphs from a graph database, with relevance being measured by some score. In Roy et al. (2022) node embeddings are learned in order to produce a subgraph matching for the computation of the relevance score. In Li et al. (2019) , nodes are matched in order to produce a graph similarity score without producing node embeddings as an intermediate step. In both cases, the queries are significantly smaller than ours. The goal of subgraph matching is to match the nodes of a smaller graph to those of a subgraph in a bigger graph via minimizing some error criteria, possibly in the presence of available attribute information. Many methods for graph matching effectively solve a subgraph isomorphism problem, even though they are not specifically designed for this purpose (Zhang & Tong, 2016) . Recent work (Lou et al., 2020; Li et al., 2019) employs deep neural models to learn node embeddings that are subsequently used for matching. The problem of subgraph localization calls to detect a good fit, by some measure (Skitsas et al., 2023) of a target subgraph within a bigger source graph, without aiming for full isomorphism. This problem has been scarcely studied. A recent application in computer vision (Xu et al., 2020) uses subgraph localization to detect temporal actions, where a graph models actions and the temporal relations between them. However, this model uses edges for temporal aspects and inter-scene relations, and hence does not generalize to arbitrary graphs. An existing spectral solution (Candogan & Chandrasekaran, 2018) is limited to special families of graphs, such as cliques.

3. SUBGRAPH LOCALIZATION

All aforementioned problems have in common the search for one graph within another. We study the most generic form of this problem, which corresponds to the problem named Subgraph localization in our previous discussion. That is, we aim to identify a subset of the nodes of a graph G corresponding to an input graph Q; we do not aim at an exact 1-to-1 correspondence among all graph elements, but to simply detect a set of best matches. Problem 1. The subgraph localization problem for a graph G = ⟨V, E⟩, where V is a set of n nodes and E ⊆ V × V is a set of edges, and a query graph Q = ⟨V Q , E Q ⟩ with n Q = |V Q |, n Q < n, calls to find a set of nodes V S ⊂ V , inducing a set of edges E S ⊂ E, such that |V S | = |V Q | and there exists a bijective function f : V S → V Q between the nodes in V S and those in V Q such that for each (i, j) ∈ E S there exists (f (i), f (j)) ∈ E Q and vice versa. In many applications, solving subgraph localization, we do not need to explicitly materialize the correspondence function f . Such a one-to-one correspondence is not explicitly sought for. Thus, we can eschew recovering an exact f and instead aim at finding an indicator function δ : V → {0, 1} such that δ(v) = 1, if v ∈ V Q , and δ(v) = 0 otherwise. At first glance, finding such an indicator function seems easier than recovering a bijective function f . However, even in this identity-function formulation, the problem corresponds to the decision version of the subgraph isomorphism problem, which asks whether a graph G contains a subgraph isomorphic to another graph Q. Thus, the problem is still NP-complete. Even so, we further relax our requirements, allowing the function δ to be a binary version of a continuous real-value function v : V → R on values below a threshold τ : δ(v) = 1 if v(v) < τ 0 otherwise (1) This relaxed problem calls to find a real function, or, equivalently, a real vector v ∈ R n , with n = |V |, for a known permutation of nodes in the graph. To overcome the requirement for a known node permutation, we consider a permutation-invariant spectral alignment approach reminiscent of the Hamiltonian operator used in shape analysis (Choukroun et al., 2018; Rampini et al., 2019) . Before delving into the approach, we introduce the necessary notation. Background. The adjacency matrix of graph G with n nodes is a n × n matrix A ∈ {0, 1} n×n where A ij = 1 if (i, j) ∈ E, 0 otherwise. The degree matrix D is an n × n diagonal matrix where each entry d ii = j̸ =i A ij holds the degree of node i. The graph Laplacian matrix is defined as: L = D -A. The Laplacian matrix of undirected graphs is a positive semi-definite symmetric matrix, hence its eigenvalues λ 1 , . . . , λ n , are real and non-negative. The spectrum λ(M) of a matrix M is the ordered sequence λ 1 ≤ ... ≤ λ n of its eigenvalues. Correspondingly, a graph's spectrum is the spectrum of its Laplacian matrix.

4. SPECTRAL SUBGRAPH LOCALIZATION

We examine how the presence of a subgraph within a graph affects the graph's spectrum. Spectral theory establishes that the spectrum of a subgraph interlaces with the spectrum of the graph. However, the problem is also non-trivially affected by nodes beside the subgraph. Still, if we could compensate for the effect of nodes other than the subgraph's nodes, the two spectra would be indistinguishable. Following this reasoning, we devise a novel objective for subgraph localization. To that end, we first propose an original connection between subgraph localization and inverse eigenvalue problems with structural constraints (Chu & Golub, 2005) . Inverse Eigenvalue Problem. The general additive inverse eigenvalue problem (AIEP) is defined as follows: Problem 2 (AIEP, Problem 3.6 in Chu & Golub (2005) ). Given an n × n matrix A, a special class of matrices N , and a set of scalars {λ Q i } k i=1 , find X ∈ N such that {λ(A + X) i } k i=1 = {λ Q i } k i=1 . A vast literature on this problem (see Chu & Golub (2005) and references therein) explores questions regarding the existence of solutions and numerical approximation algorithms for various special classes of matrices N . A common variant of Problem 2 expresses the problem as a least squares problem between the spectra: min X∈N ∥λ(A + X) -λ Q ∥ 2 . ( ) In what follows, we establish a connection between the subgraph localization problem (Problem 1) and the additive inverse eigenvalue problem (Problem 2). Under the above formulation, we aim to find a v that, added to the diagonal of the Laplacian of G, renders its first n Q eigenvalues equal to those of the query graph. In addition to finding v, we aim to remove from G the edges that connect the identified part to the remaining nodes. To the best of our knowledge, this is the first time such a connection has been established, and the first time an AIEP with structural Laplacian constraints is considered. To devise our solution for subgraph localization, we commence with an intuitive scenario. We assume that G has a number of clearly separated communities, one of which corresponds to the query graph Q. A community is defined by a cut, as nodes within the same community are more well connected than nodes across communities. Without loss of generality, assume the graph comprises two distinct communities. In this case, G's Laplacian is a block matrix with two diagonal blocks L 11 ∈R n Q ×n Q and L 22 ∈R (n-n Q )×(n-n Q ) and a few entries in the blocks L 12 ∈R n Q ×(n-n Q ) and L 21 ∈R (n-n Q )×n Q representing edges across the two communities. The spectra λ(L) of G and λ(L Q ) of Q differ on the nodes in L 22 and the edges in L 12 and L 21 . We aim to transform L into a Hamiltonian Choukroun et al. (2018) , defined hereby, to cancel out this difference. A Hamiltonian is an operator H = L + diag(v) where v : V → R is a scalar realvalued function and L is the Laplacian. The Hamiltonian reduces to the Laplacian if the potential is 0. According to (Rampini et al., 2019 , Lemma 1), if we add to the diagonal of L a vector v having non-zero values, v(v) > τ , limited to nodes in L 22 , i.e., outside V Q , then eigenvectors corresponding to eigenvalues λ i < τ of the resulting spectrum λ(L + diag(v)) will have non-zero values limited to the positions corresponding to nodes in V Q , in effect rendering λ(L + diag(v)) similar to λ(L Q ). Still, the non-zero entries between communities in L 12 , L 21 affect the spectrum. To cancel that effect, we introduce a Laplacian editing matrix that removes the contribution of such edges to the Laplacian of the graph G: E = -diag(L 12 1) L 12 L 21 -diag(L 21 1) where L 12 1 (resp. L 21 1) corrects the degree of the nodes after removing the edges in L 12 (resp. L 21 ). In effect, the corrected Laplacian L -E is equivalent to the Laplacian of a graph with two connected components, one of which isomorphic to the query graph Q. Thus, the solution v renders the |V Q | smallest eigenvalues of the corrected Laplacian indistinguishable from the spectrum of Q, λ Q , i.e., λ(L -E + diag(v)) = λ Q , where, with a slight abuse of notation, λ(L -E + diag(v)) refers to the |V Q | smallest eigenvalues of L -E + diag(v). Since both v and E are unknown, we optimize the objective: min v,E ∥λ(L -E + diag(v)) -λ Q ∥ 2 2 s.t. E = E ⊤ , E1 = 0, off(L -E) ≤ 0, ∥v∥ = c. This objective is not convex, yet it only depends on the spectrum, for which there exists efficient approximations (Cohen-Steiner et al., 2018) ; it leads to a solution even if the initial value of v is a noisy version of the ground truth. As constraints, we postulate that E should be: (i) symmetric, E = E ⊤ ; (ii) row-(and, by symmetry, also column-) centered, E1 = 0, with every row summing to 0; and (iii) yielding only non-positive off-diagonal entries off(L-E) ≤ 0. In addition, we enforce that v be a point on the surface of a sphere of radius c, via the constraint ∥v∥ = c. Proposition 4.1 provides a sufficient condition on c for the optimality of Equation ( 4), considering the noiseless case where G exactly contains the subgraph Q. Proposition 4.1. When c > √ n -n Q max(λ Q ), the global optimum of Equation (4) is obtained at v = 0 if v i ∈ V Q c √ n-n Q otherwise (5) with ṽ = v -min(v) max(v) -min(v) , S ij = |ṽ i -ṽj |A ij , E = diag(S1) -S Proof. Let E be constructed from Equations 5-6. L -E is the Laplacian of a graph composed of two disjoint components, one of which is exactly the component indicated by Equation 5, i.e., the query subgraph Q. Then there is a permutation Π such that ΠLΠ ⊤ is a block diagonal matrix with the Laplacian of each component on the diagonal. Without loss of generality, we assume that the Hamiltonian operator attains this block diagonal form: L -E + diag(v) = L Q L Q + c √ n-n Q 1 . ( ) When c satisfies the stated condition, the spectrum of the bottom-right block contains only eigenvalues larger than max(λ Q ). It follows that the first n Q eigenvalues of L -E + diag(v) are exactly those of L Q , rendering the objective of Equation 4 equal to zero. In effect, by Proposition 4.1, we can recover the optimal solution if v is appropriately normalized and c is no less than a certain value. We exploit this result in Section 4.2 to design our algorithm by numerical optimization. We first introduce a regularization term. Regularization. The objective in Equation 4does not prevent v from taking arbitrary values. However, since L -E has two connected components, v plays a role similar to that of Fiedler's vector in the minimization of the normalized cut (Shi & Malik, 2000) . This observation leads us to the spectral regularization v ⊤ (L -E)v that exhorts v to take values in the null-space of L -E. In other words, the spectral regularizer drives v to be a stepwise function. We combine the spectral regularization with our objective as follows: min v,E ∥λ(L -E + diag(v)) -λ Q ∥ 2 2 Data term +µ v ⊤ (L -E) v Spectral regularizer s.t. E = E ⊤ , E1 = 0, off(L-E) ≤ 0, ∥v∥ = c (8) where µ ≥ 0 is a regularization coefficient. Corollary. Proposition 4.1 applies also with the spectral regularization term in Equation (8). Proof. Let E be constructed from Equations ( 5)-( 6). L -E is the Laplacian of a graph composed of two disjoint components, one of which is exactly indicated by Equation ( 5), i.e., the query subgraph Q. Then v in Equation ( 5) belongs to the null-space of L -E, rendering the regularization term 0, hence Equation ( 5) also provides the global minimum of Equation ( 8).

4.1. LOCALIZING DISCONNECTED SUBGRAPHS

A special case of subgraph localization is that of a graph with a number of connected components, one of which corresponds to the query graph Q. In this case the editing matrix E = 0, leading to the simpler objective: min v ∥λ(L + diag(v)) -λ Q ∥ 2 2 + µv ⊤ Lv s.t. ∥v∥ = c.

4.2. NUMERICAL OPTIMIZATION

We exploit Proposition 4.1 to craft a numerical procedure that minimizes the objective in Equation (4), collaterally optimizing for E and v. In the first iteration q = 0, we initialize E q = 0. In iteration q + 1 we minimize f (v, E q ) = λ(L -E q + diag(v)) -λ Q ∥ 2 2 + µv ⊤ (L -E q ) v for v given E q : v q+1 = arg min v:∥v∥=c f (v, E q ), via projected gradient descent, until convergence; an iteration k + 1 of projected gradient descent performs the step: x k+1 = x k+1 -α∇ v f (v, E q ) v k+1 = c x k ∥x k ∥, where α > 0 regulates the learning rate. The gradient ∇ v for Equation 11 requires a differentiable eigendecomposition, which is achievable by extant methods (Wang et al., 2019) . We subsequently update E according to: ṽ = v q -min(v q ) max(v q ) -min(v q ) , S ij = |ṽ i -ṽj |A ij , ( ) E q+1 = diag(S1) -S. We obtain a threshold τ of the indicator function δ(v) in Equation 1for the nodes comprising the subgraph by splitting the elements of v into two clusters minimizing sum-of-squares error from the mean (i.e., optimizing the k-means objective in one dimension) and compute the matrix E from this thresholded v by Equations 12-14. The SSL algorithm. We eventually present our Spectral Subgraph Localization (SSL) algorithm (Algorithm 1 in the supplementary material) for Problem 1. SSL takes as input the adjacency matrix A of the full graph G and the spectrum of a query subgraph, and returns the vector v and the threshold τ of the indicator function δ; it additionally requires some hyperparameters, such as the number of outer iterations maxiter out , the number of inner iterations maxiter in , the learning rate α, and the regularization coefficient µ. We empirically found that the number of iterations and the learning rate do not significantly affect results across datasets if chosen within some range; we report those ranges and recommended values in Table 1 in the supplementary material. On the other hand, the regularization coefficient µ in Equation 8 requires tuning for each dataset. We thus first normalize the value of µ by c 2 to remove the dependency on v's magnitude and then perform grid search on a range of values for µ to select an appropriate value. The optimization process alternates the projected gradient optimization in Equation 11 and the update of E using Equations 12-14 until it converges or reaches the maximum number of iterations maxiter out . Figure 2 illustrates the solution's progressive convergence through iterations, while Figure 3 shows an example result.

4.3. COMPLEXITY ANALYSIS

We derive the worst-case time complexity of the algorithm in the number of nodes n in the graph G. The eigendecomposition in Equation 11takes O(n 3 ) per iteration; the computation in Equation 14takes O(n 2 ) for the matrix-vector multiplication; 1-D k-means in Line 9 takes O(n) with the best algorithm (Grønlund et al., 2017) . In effect, the total time is O(maxiter out • (maxiter in * n 3 + n 2 + n)), where the O(n 3 ) term dominates. However, as L -E + diag(v) is a graph's Laplacian, its spectrum can be efficiently approximated through sampling (Cohen-Steiner et al., 2018) .

5. EXPERIMENTS

Here we empirically evaluate our method, SSL, on a number of datasets and against several hypotheses. Our evaluation aims to answer the following questions: (Q1) Do the regularization term and the constraint ∥v∥ = c in Equation ( 8) help the localization? (Q2) How does the conductance of the part corresponding to Q affect the quality of localization and how does SSL fare against state-of-the-art methods for graph alignment? (Q3) What kind of graphs are challenging for SSL and why?

5.1. EXPERIMENT DESIGN

The code and data are available at https://anonymous.4open.science/r/SSL-F39A. Hyperparameters. Unless stated otherwise, we choose maxiter out = 3, maxiter in = 500, a tol = 10 -5 and α = 0.02. Regarding the regularization coefficient, we select µ = 0.2 through grid search. This choice achieves good accuracy across datasets and conductance levels. Datasets. We evaluate SSL on the three real-world graphs from Rossi & Ahmed (2015) and two synthetic graphs generated by the Erdős-Renyi (ER) and Barabasi-Albert (BA) models. The data characteristics are described in the supplementary material. Additionally, we generate graphs with community structure using the stochastic block model (SBM) (Holland et al., 1983) . Choosing Q. Given a number k, we generate a query workload of size V Q = k from a real-world graph G to evaluate our subgraph localization method as follows. 1. Randomly select a node u, add it to V Q and place all its neighbors into a set N . 2. Randomly select a node u ′ from N , add it to V Q , place in N all its neighbors not in V Q .

3.. Repeat the previous step until |V

Q | = k. 4. Set Q as the subgraph induced by V Q in G. For graphs generated by the stochastic block model, we set Q as the smallest community. Quality measure. To evaluate performance in a manner independent of subgraph size, we use Balanced Accuracy (BA) (Brodersen et al., 2010) ; given the query graph V Q and the subgraph V S returned by a localization algorithm, balanced accuracy BA(v) = 1 2 ( |V Q ∩V S | |V Q | + |¬V Q ∩¬V S | |¬V Q | ) is the arithmetic mean of sensitivity (or recall) and specificity. 

5.2. ABLATION STUDY

We commence our study by examining how the terms in SSL's objective function (Equation 8) affect the result. Recall that the objective function consists of: (1) the data term, that drives the alignment between the spectrum of the part and that of the query, (2) a spectral regularization term that exhorts v to be in the null space of L -E and (3) the sphere constraint that enforces a constant norm on the potential v. To study the contribution of each term on the results, we compare SSL against two variants thereof: 1. A method only optimizing the data term ∥λ(L - E + diag(v)) -λ Q ∥ 2 2 . 2. A method optimizing a linear combination of the data term ∥λ(L -E + diag(v)) -λ Q ∥ 2 2 and the spectral regularization v ⊤ (L -E) v, without a sphere constraint. We experiment on graphs with |V | = 200 nodes sampled form the stochastic block model, letting the size of the query subgraph increase from 20% of the graph to 45%. Figure 4a reports on the results of this ablation study in terms of average balanced accuracy over 5 sampled graphs for each subgraph size. Unsurprisingly, the optimization of the data term yields the worst results, although the method performs well on small subgraphs. Still, the addition of the spectral regularizer and sphere constraint enhances the results up to 20% accuracy. For small subgraphs the sphere constraint brings only marginal gains compared to the spectral regularization. On the other hand, on large query subgraphs, the sphere constraint boosts the accuracy by an additional 8%. To further corroborate these results, Figure 4b shows an example of how the terms impact the potential v, on a 40-node graph sampled from the SBM with two communities with 20 nodes each; the query graph is one of the two communities. Ideally, we would like to obtain a v clearly separating values between the part corresponding to the query graph and the rest. In that case, we say that v forms a step function. The optimization of the data term (left chart) alone leads to no clear separation between the two parts. Introducing the spectral regularization (middle chart) yields a result closer to a step function, though some nodes are incorrectly assigned to the part. Finally, the full objective in Equation 8produces to a clearly separated potential vector v. Visualizing the mapping of this potential to the graph G, we clearly recognize the part G S as the light-colored nodes.

5.3. COMPETING METHODS

Here we assess our method against previous work. To the best of our knowledge, no extant unsupervised method is capable to answer localization queries in graphs with more than 15 nodes (Roy et al., 2022) . Therefore, we compare SSL to the nearest feasible competitor, namely the state-ofthe-art method for unsupervised graph alignment, CONE (Chen et al., 2020) . To set up CONE so that it detects subgraphs, we inject in the query nodes with degree 0, so that the size of the query Q corresponds to that of the graph G, i.e., |V Q = V |. We extract the ensuing localization vector as the matches of query nodes in G with the default hyper-parameter settings.  V Q ) = i∈V Q ,j / ∈V Q Aij min i∈V Q ,j∈V Aij , i / ∈V Q ,j∈V Aij , i.e., the ratio between the size of the cut among query Q and graph G and the minimum number of edges among the two resulting partitions. A graph's minimum conductance is associated (Cheeger, 2015) to its algebraic connectivity, i.e., second smallest eigenvalue λ 2 (Fiedler, 1973) . A larger conductance denotes more edges between the query subgraph and the rest of the graph, thus a harder subgraph localization instance. We use query subgraphs corresponding to 10%, 20% and 30% of the full graph size. The results in Figure 5 show that SSL effectively localizes the query in real graphs. Accuracy gradually increases as the conductance approaches Φ(V Q ) = 0, finally settling at 100% accuracy on all datasets, when the query is disconnected. In the Malaria dataset, we note a more abrupt increase. The performance of SSL is always comparable to, and most often exceeds, that of CONE. While performance drops as conductance grows, in real applications we would aim at detecting interesting subgraphs that exhibit distinguishable structures, such as social communities. Such subgraphs deviate substantially from both random and complete subgraphs. We model these nontrivial connectivity patterns by a lower conductance. As conductance increases, the subgraph progressively becomes merged into other nodes, hence SSL cannot discriminate it. The results in Figure 6 show that SSL consistently outperforms CONE on synthetic graphs. As with real graphs, we observe a gradual accuracy increase as the graph becomes progressively disconnected. Notably, on ER graphs, SSL succeeds even at high conductance values (> 0.6). Impact of the graph's spectrum. To better understand the performance of SSL on different graphs, we look at it under the lens of the graph's spectrum. Figure 7 shows the spectra of the real (Figure 5 ) and synthetic graphs (Figure 6 ) in our experiments, normalized in the range 0, λn-λ2 λn . First, we observe that the spectrum of synthetic graphs exhibits a gradual increase and a small difference between λ 2 and the maximum eigenvalue λ n . By the Generalized Cheeger's inequality (Lee et al., 2010) the k th -order conductance, min V1,V2,...,V k max{Φ(V i ) : i = 1, 2, ..., k}, is related to the k th eigenvalue. We conclude that, under gradual eigenvalue growth, the presence or absence of one edge does not affect the spectrum significantly, hence the projected gradient descent in SSL gracefully retrieves a good solution. On the other hand, the spectra of real graphs in our experiments exhibit an abrupt divergence between λ 2 and higher eigenvalues, indicating that a single edge may significantly affect the spectrum, rendering the task of projected gradient descent more challenging. In effect, SSL performs better as the gap between λ 2 and the rest of the eigenvalues decreases. 

6. CONCLUSION

We studied the challenging problem of subgraph localization, which calls to find a set of nodes in a larger graph that best corresponds to a given subgraph. We devised a novel spectral solution that identifies the query match by adding a penalty to the Laplacian matrix so as to obtain a spectrum similar to that of the query graph. This novel approach requires solving a non-convex, non-smooth problem for which we devised a numerical method. Our results demonstrate that our spectral method localizes query subgraphs more effectively than a baseline based on the state-of-the-art method for graph alignment. To our knowledge, this is the first endeavor in effective subgraph localization that can handle graphs of any size in the order of magnitude of hundreds of nodes.

A SUPPLEMENTARY MATERIAL

We implemented SSL in 3.6 and ran experiments on an 8-core Intel Core i7-8565U machine with 16GB RAM. Code and data available at https://anonymous.4open.science/r/ SSL-F39A.

A.1 HYPERPARAMETER SETTING

We calibrated SSL using grid search on the hyperparameters maxiter in , maxiter out , a tol , and α. The range of the tested hyperparameters is depicted in Table 1 . We observe no significant difference in the hyperparameters in different datasets, vindicating the robustness of our method. Surprisingly, we observe the same robustness for the regularization parameter µ.  and 9 . In the results of Figure 8 , we observe that the spectrum of the query graph and that of the detected subgraph are well aligned. However, the localized subgraph deviates substantially from the ground truth. Similarly, in the results of Figure 9 , while SSL does not perfectly align the two spectra, it yields a correlated spectrum. Nevertheless, SSL detects a subgraph comprising nodes that are only connected by one edge. In both cases, the challenge arises from the sensitivity of the spectrum at weakly connected parts of the graph. Changing the Laplacian in such parts by adding v has a larger impact on the spectrum than changing the Laplacian in a well-connected neighborhood. These types of graphs force the optimization process into a local optimum, as the optimizer has a large incentive to separate these weakly connected parts of the graph. q ← q + 1 12: return v q , τ



Figure 1: An instance of subgraph localization (left) and its solution (right).

Figure 2: Alignment of the spectrum λ Q of Q and the part of the spectrum λ(L -E + diag(v))of G corresponding to Q at 1, 6, and 18 iterations. As two spectra progressively converge, especially in smaller eigenvalues, Q is correctly localized in G.

Figure 3: Example SSL result: ground-truth subgraph Q in blue; remaining nodes of G in red.

v values in ascending order and corresponding graph nodes.

Figure 4: Variants of the objective function; SSL's objective achieves the highest accuracy.

Figure 5 and Figure 6 present the average BA of 10 randomly generated connected subgraphs as a function of conductance, Φ(V Q ) =

Figure 5: Accuracy vs. conductance between query subgraph Q and graph G, real graphs.

Figure 7: Graph spectra.

Figure8: Alignment of the spectrum λ Q of Q and the corresponding part of the spectrum λ(L -E + diag(v)) of G after convergence, ground truth V S (blue) and V \V S (white), and corresponding localization by SSL; while the spectra are perfectly aligned, the detected subgraph is not the ground truth. The depicted graph is a protein-protein interaction network from the D&D datasetDobson & Doig (2003).

SSL hyperparameters and default values.

Table2presents the characteristics of the datasets used in the experimental evaluations in the number of nodes V , edges E, network type, and parameters.

Real graphs used in our evaluation: number of nodes |V |, number of vertices |E|, and graph type.

Synthetic graphs used in our evaluation: number of nodes |V |, number of vertices |E|, number of edges to attach from a new node to existing nodes m new , edge creation probability p new . A.3 CHALLENGING CASES In this section, we investigate some examples of challenging cases for SSL, illustrated in Figures 8

