PHASE TRANSITION FOR DETECTING A SMALL COMMUNITY IN A LARGE NETWORK

Abstract

How to detect a small community in a large network is an interesting problem, including clique detection as a special case, where a naive degree-based χ 2 -test was shown to be powerful in the presence of an Erdős-Renyi background. Using Sinkhorn's theorem, we show that the signal captured by the χ 2 -test may be a modeling artifact, and it may disappear once we replace the Erdős-Renyi model by a broader network model. We show that the recent SgnQ test is more appropriate for such a setting. The test is optimal in detecting communities with sizes comparable to the whole network, but has never been studied for our setting, which is substantially different and more challenging. Using a degree-corrected block model (DCBM), we establish phase transitions of this testing problem concerning the size of the small community and the edge densities in small and large communities. When the size of the small community is larger than √ n, the SgnQ test is optimal for it attains the computational lower bound (CLB), the information lower bound for methods allowing polynomial computation time. When the size of the small community is smaller than √ n, we establish the parameter regime where the SgnQ test has full power and make some conjectures of the CLB. We also study the classical information lower bound (LB) and show that there is always a gap between the CLB and LB in our range of interest.

1. INTRODUCTION

Consider an undirected network with n nodes and K communities. We assume n is large and the network is connected for convenience. We are interested in testing whether K = 1 or K > 1 and the sizes of some of the communities are much smaller than n (communities are scientifically meaningful but mathematically hard to define; intuitively, they are clusters of nodes that have more edges "within" than "across" (Jin, 2015; Zhao et al., 2012) ). The problem is a special case of network global testing, a topic that has received a lot of attention (e.g., Jin et al. (2018; 2021b) ). However, existing works focused on the so-called balanced case, where the sizes of communities are at the same order. Our case is severely unbalanced, where the sizes of some communities are much smaller than n (e.g., n ε ). The problem also includes clique detection (a problem of primary interest in graph learning (Alon et al., 1998; Ron & Feige, 2010 )) as a special case. Along this line, Arias-Castro & Verzelen (2014);



Verzelen & Arias-Castro (2015) have made remarkable progress. In detail, they considered the problem of testing whether a graph is generated from a one-parameter Erdős-Renyi model or a two-parameter model: for any nodes 1 ≤ i, j ≤ n, the probability that they have an edge equals b if i, j both are in a small planted subset and equals a otherwise. A remarkable conclusion of these papers is: a naive degree-based χ 2 -test is optimal, provided that the clique size is in a certain range. Therefore, at first glance, it seems that the problem has been elegantly solved, at least to some extent.Unfortunately, recent progress in network testing tells a very different story: the signal captured by the χ 2 -test may be a modeling artifact. It may disappear once we replace the models in Arias-Castro

