DETECTING SMALL QUERY GRAPHS IN A LARGE GRAPH VIA NEURAL SUBGRAPH SEARCH Anonymous

Abstract

Recent advances have shown the success of using reinforcement learning and search to solve NP-hard graph-related tasks, such as Traveling Salesman Optimization, Graph Edit Distance computation, etc. However, it remains unclear how one can efficiently and accurately detect the occurrences of a small query graph in a large target graph, which is a core operation in graph database search, biomedical analysis, social group finding, etc. This task is called Subgraph Matching which essentially performs subgraph isomorphism check between a query graph and a large target graph. One promising approach to this classical problem is the "learning-to-search" paradigm, where a reinforcement learning (RL) agent is designed with a learned policy to guide a search algorithm to quickly find the solution without any solved instances for supervision. However, for the specific task of Subgraph Matching, though the query graph is usually small given by the user as input, the target graph is often orders-of-magnitude larger. It poses challenges to the neural network design and can lead to solution and reward sparsity. In this paper, we propose NSUBS with two innovations to tackle the challenges: (1) A novel encoder-decoder neural network architecture to dynamically compute the matching information between the query and the target graphs at each search state; (2) A novel look-ahead loss function for training the policy network. Experiments on six large real-world target graphs show that NSUBS can significantly improve the subgraph matching performance.

1. INTRODUCTION

With the growing amount of graph data that naturally arises in many domains, solving graph-related tasks via machine learning has gained increasing attention. Many NP hard tasks, e.g. Traveling Salesman Optimization (Xing & Tu, 2020 ), Graph Edit Distance computation (Wang et al., 2021 ), Maximum Common Subgraph detection (Bai et al., 2021) , have recently been tackled via learningbased methods. These works on the one hand rely on search to enumerate the large solution space, and on the other hand use reinforcement learning (RL) to learn a good search policy from training data, thus obviating the need for hand-crafted heuristics adopted by traditional solvers. Such learning-tosearch paradigm (Bai et al., 2021 ) also allows the training the RL agent without any solved instances for supervision. However, how to design a neural network architecture under the RL-guided search framework remains unclear for the task of Subgraph Matching, which requires the detection of all occurrences of a small query graph in an orders-of-magnitude larger target graph. Subgraph Matching has wide applications in graph database search (Lee et al., 2012) , knowledge graph query (Kim et al., 2015) , biomedical analysis (Zhang et al., 2009) , social group finding (Ma et al., 2018) , quantum circuit design (Jiang et al., 2021) , etc. As a concrete example, Subgraph Matching is used for protein complex search in a protein-protein interaction network to test whether the interactions within a protein complex in a species are also present in other species (Bonnici et al., 2013) . Due to its NP-hard nature, the state-of-the-art Subgraph Matching algorithms rely on backtracking search with various techniques proposed to reduce the large search space (Sun & Luo, 2020; Kim et al., 2021; Wang et al., 2022) . However, these techniques are mostly driven by heuristics, and as a result, we observe that such solvers often fail to find any solution on large target graphs under a reasonable time limit, although they tend to work well on small graph pairs. We denote this phenomenon as solution sparsity. Such solution sparsity requires the designed model to not only have enough capacity but also to run efficiently under limited computational budget. Another consequence

