DEEP ENSEMBLES FOR GRAPHS WITH HIGHER-ORDER DEPENDENCIES

Abstract

Graph neural networks (GNNs) continue to achieve state-of-the-art performance on many graph learning tasks, but rely on the assumption that a given graph is a sufficient approximation of the true neighborhood structure. When a system contains higher-order sequential dependencies, we show that the tendency of traditional graph representations to underfit each node's neighborhood causes existing GNNs to generalize poorly. To address this, we propose a novel Deep Graph Ensemble (DGE), which captures neighborhood variance by training an ensemble of GNNs on different neighborhood subspaces of the same node within a higherorder network representation. We show that DGE consistently outperforms existing GNNs on semisupervised and supervised tasks on six real-world data sets with known higher-order dependencies, even under a similar parameter budget. We demonstrate that diverse and accurate base classifiers are central to DGE's success, and discuss the implications of these findings for future work on ensembles of GNNs.

1. INTRODUCTION

Graph neural networks (GNNs) solve learning tasks by propagating information through each node's neighborhood in a graph (Zhou et al., 2020; Wu et al., 2020) . Most present work on GNNs assumes that a given graph is a sufficient approximation of the underlying neighborhood structure. But a growing body of work has challenged this assumption by showing that traditional graphs often cannot capture the higher-order structure and dynamics that govern many real-world systems (Lambiotte et al., 2019; Battiston et al., 2020; Porter, 2020; Torres et al., 2021; Battiston et al., 2021) . In the present work, we couple GNNs with a specific family of graphs, higher-order networks (HONs), which encode sequential higher-order dependencies (i.e., conditional probabilities that cannot be explained by a first-order Markov model) in a graph structure. A traditional graph, which we call a first-order network (FON), represents a system by decomposing it into a set of pairwise edges, so the only way to infer polyadic interactions is via transitive paths over adjacent nodes. When higher-order dependencies are present, these Markovian paths underfit the true neighborhood (Scholtes, 2017) and can thus produce many false positive interactions between nodes (Lambiotte et al., 2019) . To address this limitation, Xu et al. (2016) proposed a HON that creates conditional nodes to more accurately encode the observed higher-order interactions. By preserving this additional information in the graph structure, HONs have produced new insights in studies of user behavior (Chierichetti et al., 2012 ), citation networks (Rosvall et al., 2014) , human mobility and navigation patterns (Scholtes et al., 2014; Peixoto & Rosvall, 2017) , the spread of invasive species Saebi et al. (2020b ), anomaly detection (Saebi et al., 2020d) , disease progression (Krieg et al., 2020b) , and more (Koher et al., 2016; Peixoto & Rosvall, 2017; Scholtes, 2017; Lambiotte et al., 2019; Saebi et al., 2020a) . However, their use with GNNs has not been thoroughly explored. As Figure 1 illustrates, the tendency of FONs to underfit has consequences for GNNs, which typically compute representations by recursively pooling features from each node's neighbors. In order Figure 1 : A toy example of challenges faced by GNNs in modeling systems with higher-order dependencies. A FON (G 1 ) underfits the higher-order dependencies in the observed paths. Consequently, a GNN will learn similar representations for A and B, since they share the same 2-hop neighborhood in G 1 . A HON (G k , with k = 2 in this example) uses conditional nodes to encode higher-order dependencies. For example, node C|A represents the observed dependency that C only interacts with D when it also interacts with A (note that in real-world systems, G k rarely breaks the graph into multiple components). However, computing a representation for C then requires a GNN to aggregate multiple local neighborhoods. Colors depict node features. to maximize GNN performance, we must ensure that local neighborhoods capture the true distribution of interactions in the system. To enable GNNs to utilize the additional information encoded in HONs, we propose a novel Deep Graph Ensemble (DGE), which uses independent GNNs to exploit variance in higher-order node neighborhoods and learn effective representations in graphs with higher-order dependencies. The key contributions of our work include: 1. We analyze the data-level challenges that fundamentally limit the ability of existing GNNs to learn effective models of systems with higher-order dependencies. 2. We introduce the notion of neighborhood subspaces by showing that neighborhoods in a HON are analogous to feature subspaces of first-order neighborhoods. Borrowing from ensemble methods, we then propose DGE to exploit the variance in these subspaces. 3. We experimentally evaluate DGE against eight state-of-the-art baselines on six real-world data sets with known higher-order dependencies, and show that, even with similar parameter budgets, DGE consistently outperforms baselines on semisupervised (node classification) and supervised (link prediction) tasks.foot_0 4. We demonstrate that DGE's ability to train accurate and diverse classifiers is central to strong performance, and show that ensembling multiple GNNs with separate parameters is a consistent way to maximize the trade-off between accuracy and diversity.

2. BACKGROUND AND PRELIMINARIES

2.1 HIGHER-ORDER NETWORKS Let S = {S 1 , S 2 , ..., S n } be a set of observed paths (e.g., flight itineraries, disease trajectories, or user clickstreams), where each S i = ⟨s 1 , s 2 , ..., s m ⟩ is a sequence of entities (e.g., airports, diagnosis codes, or web pages). Let A = S denote the set of entities across all sequences. By using a graph to summarize S, we can model the global function of each entity in the system and solve a number of useful learning problems. For example, we can predict disease function via node classification or interactions between airports using link prediction.foot_1 However, there is a large space of possible graphs that can represent S. We consider two: a FON, and the HON introduced by Xu et al. (2016) . In a FON G 1 = (V 1 , E 1 ), the node set V 1 = A (or, more generally, the mapping f : V 1 -→ A is bijective), and the edge set E 1 is the set of node pairs (u, v) ∈ V 1 × V 1 that are adjacent elements in at least one S i .



Code and 3 data sets are available at https://github.com/sjkrieg/dge. This is distinct from sequence models like transformers, which typically predict an entity's local function within a single sequence.

