HIGHER-ORDER STRUCTURE PREDICTION IN EVOLV-ING GRAPH SIMPLICIAL COMPLEXES Anonymous

Abstract

Dynamic graphs are rife with higher-order interactions, such as co-authorship relationships and protein-protein interactions in biological networks, that naturally arise between more than two nodes at once. In spite of the ubiquitous presence of such higher-order interactions, limited attention has been paid to the higher-order counterpart of the popular pairwise link prediction problem. Existing higher-order structure prediction methods are mostly based on heuristic feature extraction procedures, which work well in practice but lack theoretical guarantees. Such heuristics are primarily focused on predicting links in a static snapshot of the graph. Moreover, these heuristic-based methods fail to effectively utilize and benefit from the knowledge of latent substructures already present within the higher-order structures. In this paper, we overcome these obstacles by capturing higher-order interactions succinctly as simplices, model their neighborhood by face-vectors, and develop a nonparametric kernel estimator for simplices that views the evolving graph from the perspective of a time process (i.e., a sequence of graph snapshots). Our method substantially outperforms several baseline higherorder prediction methods. As a theoretical achievement, we prove the consistency and asymptotic normality in terms of Wasserstein distance of our estimator using Stein's method.

1. INTRODUCTION

Numerous types of networks like social (Liben-Nowell & Kleinberg, 2007a) , biological (Airoldi et al., 2006) , and chemical reaction networks (Wegscheider, 1911) are highly dynamic, as they evolve and grow rapidly via the appearance of new interactions, represented as the introduction of new links / edges between the nodes of a network. Identifying the underlying mechanisms by which such networks evolve over time is a fundamental question that is not yet fully understood. Typically, insight into the temporal evolution of networks has been obtained via a classical inferential problem called link prediction, where given a snapshot of the network at time t along with its linkage pattern, the task is to assess whether a pair of nodes will be linked at a later time t > t. While inferring pairwise links is an important problem, it is oftentimes observed that most of the real-world graphs exhibit higher-order group-wise interactions that involve more than two nodes at once. Examples illustrating human group behavior involve a co-author relationship on a single paper and a network of e-mails to multiple recipients. In nature too, one can observe several proteins interacting together in a biological network simultaneously. In spite of their significance, in comparison to single edge inference, relatively fewer works have studied the problem of predicting higher-order group-wise interactions. Benson et al. (2018) originally introduced a simplex to model group-wise interactions between nodes in a graph. They proposed predicting a simplicial closure event, whereby an open simplex (with just pairwise interactions between member vertices) transitions to a closed simplex (where all member vertices participate in the higher-order relationship simultaneously), in the near future. Figure 1 (Middle) shows an example of such a transition from an open triangle to a closed one. Recently, several works have proposed modeling higher-order interactions as hyperedges in a hypergraph (Xu et al., 2013; Zhang et al., 2018; Yoon et al., 2020; Patil et al., 2020) . Given a hyperedge h t at time t, the inference task is to predict the future arrival of a new hyperedge h t , which covers a larger set of vertices than h t and contains all the vertices in h t . Figure 1 (Right) illustrates this hyperedge prediction task. Although prediction models based on either simplicial closure event prediction or hyperedge arrival, deal with higher-order structures, they both fail to capture the highly complex and non-linear evolution of higher-order structures over time. Both these kinds of models have two major limitations. First, they predict structures from a single static snapshot of the graph, thus not viewing the evolution process of adding new edges as a time process. Second, their feature extraction is mostly based on popular heuristics (Adamic & Adar, 2003; Brin & Page, 2012; Jeh & Widom, 2002; Zhou et al., 2009a; Barabási & Albert, 1999; Bhatia et al., 2019) that work well in practice but are not accompanied by strong theoretical guarantees. In addition to the aforementioned shortcomings, hypergraph based methods model higher-order structures as hyperedges, which omit lower-dimensional substructures present within a single hyperedge. As a consequence, they cannot distinguish between various substructure relationships. For example, hyperedge [A, B, C] in Figure 1 We provide real-world examples of where our simplicial complex based approach can play a significant role. (i) Organic Chemistry: It is quite common to have the same set of elements interacting with each other in different configurations, which result in very different functioning compounds (Ma et al., 2011) . Specifically, R-thalidomide and S-thalidomide are two different configurations of thalidomide, where the R-form was meant to help sedate pregnant women, while the S-form unfortunately resulted in birth defects. This is a famous example in stereo chemistry to show the consequences of mistaking two extremely close configurations (differing by a single bond) as being the same. Structure prediction to avoid such phenomenon in drug synthesis allows chemists to achieve a much higher yield and avoid wastage of expensive resources. (ii) Gene expression networks: Gene networks have nodes that represent genes and edges connect genes which have similar expression patterns (Zhang & Horvath, 2005) . Subgraphs called modules are tightly connected genes in such a gene expression network. Genomics research provides evidence that higher-order gene expression relationships (like second and third-order) and their measurements can have very important implications for cancer prognosis. When making structural predictions in these aforementioned examples, our simplicial complex based approach provides much more fine-grained control over competing methods by capturing subtler differences in configurations. To combat these challenges, our approach views the evolving graphfoot_0 as a time process under the framework of nonparametric time series prediction, which models the evolution of higher-order structures (as simplices) and their local neighborhoods (spatial dimension) over a moving time window (temporal dimension). Our inference problem is then modeled as predicting the evolution to a higher-dimensional simplex at time t > t, given a simplex at time t. It is important to note that this task is more general and greatly diverges from the task proposed by Benson et al. (2018) . Our



We handle the incremental model (edge insertions only) as opposed to the harder fully dynamic model (edge insertions and deletions allowed) for which most previous methods too cannot provide theoretical guarantees.



Figure 1: [Left] Given a 4-node graph, at time t, the 2-simplex [A, B, C] also contains 1-simplices [A, B], [B, C] and [A, C]. At time t > t, the 2-simplex evolves (by connecting with D) to a 3simplex [A, B, C, D] which additionally contains 1-simplices [A, D], [B, D] and [C, D], along with 2-simplices [A, B, D], [A, C, D] and [B, C, D]. [Middle] Simplex setting with Benson et al. (2018). The method predicts [A, B, C] (closed triangle) at time t > t from an open triangle with links/edges [A, B], [B, C] and [A, C] at time t. [Right] Hypergraph represents a triple [A, B, C] as a hyperedge, without any of its subsets. It cannot distinguish between [[A, B], [B, C], [A, C]] and [A, B, C].

(Right) cannot distinguish between group relationships like [[A, B], [B, C], [A, C]] (a set of pairwise interactions) versus [A, B, C] (all A, B and C simultaneously in a relationship). This problem is remedied by the use of simplices because they naturally model these substructures as a collection of subsets (i.e., faces) of the simplex.

