DESCO: TOWARDS SCALABLE DEEP SUBGRAPH COUNTING

Abstract

Subgraph counting is the problem of determining the number of a given query graph in a large target graph. Despite being a #P problem, subgraph counting is a crucial graph analysis method in domains ranging from biology and social science to risk management and software analysis. However, existing exact counting methods take combinatorially long runtime as target and query sizes increase. Existing approximate heuristic methods and neural approaches fall short in accuracy due to high label dynamic range, limited model expressive power, and inability to predict the distribution of subgraph counts in the target graph. Here we propose DeSCo, a neural deep subgraph counting framework, which aims to accurately predict the count and distribution of query graphs on any given target graph. De-SCo uses canonical partition to divide the large target graph into small neighborhood graphs and predict the canonical count objective on each neighborhood. The proposed partition method avoids missing or double-counting any patterns of the target graph. A novel subgraph-based heterogeneous graph neural network is then used to improve the expressive power. Finally, gossip correction improves counting accuracy via prediction propagation with learnable weights. Compared with state-of-the-art approximate heuristic and neural methods. DeSCo achieves 437× improvement in the mean squared error of count prediction and benefits from the polynomial runtime complexity.

1. INTRODUCTION

Given a query graph and a target graph, the problem of subgraph counting is to count the number of patterns, defined as subgraphs of the target graph, that are graph-isomorphic to the query graph Ribeiro et al. (2021) . While being an essential method in graph and network analysis, subgraph counting is a #P-complete problem Valiant (1979) . Due to the computational complexity, existing exact counting algorithms are restricted to small query graphs with no more than 5 vertices Pinar et al. 



Takigawa & Mamitsuka (2013); Solé & Valverde (2008); Adamcsek et al. (2006); Bascompte & Melián (2005); Bader & Hogue (2003), social science Uddin et al. (2013); Prell & Skvoretz (2008); Kalish & Robins (2006); Wasserman et al. (1994), risk management Ribeiro et al. (2017); Akoglu & Faloutsos (2013), and software analysis Valverde & Solé (2005); Wu et al. (2018).

(2017); Ortmann & Brandes (2017); Ahmed et al. (2015). The commonly used VF2 Cordella et al. (2004) algorithm fails to even count a single query of 5-node chain within a week's time budget on a large target graph Astro Leskovec et al. (2007) with nineteen thousand nodes. Luckily, approximate counting of query graphs is sufficient in many real-world use cases Iyer et al. (2018); Kashtan et al. (2004); Ribeiro & Silva (2010). Approximation methods can scale to large targets by substructure sampling, random walk, and color-based sampling, allowing estimation of the frequency of query graph occurrences. Very recently, Graph Neural Networks (GNNs) are employed as a deep learning-based approach to subgraph counting Zhao et al. (2021); Liu et al. (2020); Chen et al. (2020). The target graph and the query graph are embedded via a GNN, which predicts the motif count through a regression task.

