LEARNING TO BOOST RESILIENCE OF COMPLEX NET-WORKS VIA NEURAL EDGE REWIRING

Abstract

The resilience of complex networks, a critical structural characteristic in network science, measures the network's ability to withstand noise corruption and structural changes. Improving resilience typically resorts to minimal modifications of the network structure via degree-preserving edge rewiring-based methods. Despite their effectiveness, existing methods are learning-free, sharing the limitation of transduction: a learned edge rewiring strategy from one graph cannot be generalized to another. Such a limitation cannot be trivially addressed by existing graph neural networks (GNNs)-based approaches since there is no rich initial node features for GNNs to learn meaningful representations. However, neural edge rewiring relies on GNNs for obtaining meaningful representations from pure graph topologies to select edges. We found existing GNNs degenerate remarkably with only pure topologies on the resilience task, leading to the undesired infinite action backtracking. In this work, inspired by persistent homology, we specifically design a variant of GNN called FireGNN for learning inductive edge rewiring strategies. Based on meaningful representations from FireGNN, we develop the first end-toend inductive method, ResiNet, to discover resilient network topologies while balancing network utility. ResiNet reformulates network resilience optimization as a Markov decision process equipped with edge rewiring action space and learns to select correct edges successively. Extensive experiments demonstrate that ResiNet achieves a near-optimal resilience gain on various graphs while balancing the utility and outperforms existing approaches by a large margin.

1. INTRODUCTION

Network systems, such as infrastructure systems and supply chains, are vulnerable to malicious attacks. To provide reliable services when facing natural disasters or targeted attacks, networked systems should continue to function and maintain an acceptable level of utility when the network partially fails. Network resiliencefoot_0 , in the context of network science, is a measurement characterizing the ability of a network system to defend itself from such failures and attacks (Schneider et al., 2011) . Studying the resilience of complex networks has found wide applications in many fields, ranging from ecology (Sole & Montoya, 2001) , biology (Motter et al., 2008) , economics (Haldane & May, 2011) to engineering (Albert et al., 2004) . To improve network resilience, many learning-free optimization methods have been proposed, typically falling into the categories of heuristic-based (Schneider et al., 2011; Chan & Akoglu, 2016; Yazıcıoglu et al., 2015; Rong & Liu, 2018) and evolutionary computation (Zhou & Liu, 2014) . These methods improve the resilience of complex networks by minimally modifying graph topologies based on a degree-preserving atomic operation called edge rewiring (Schneider et al., 2011; Chan & Akoglu, 2016; Rong & Liu, 2018) . Concretely, for a given graph G = (V, E) and two existing edges AC and BD, an edge rewiring operation alters the graph structure by removing AC and BD and adding AB and CD, where AC, BD ∈ E and AB, CD, AD, BC / ∈ E. Edge rewiring has some nice properties against simply addition or deletion of edges: 1) it preserves node degree, while addition may violate capacity constraints; 2) it achieves minimal utility degradation in terms of graph Laplacian measurement, while addition/deletion may lead to a large network utility degradation (Jaume et al., 2020; Ma et al., 2021) . Despite their success, learning-free methods share the following limitations: Agent Figure 1 : Action backtracking in successive edge rewirings since GNNs cannot provide distinguishable edge representations on graphs without rich features. After selecting AC and BD from G t+2k for rewiring at step t + 2k, the agent would select AB and CD at step t + 2k + 1, returning back to G t+2k and forming a cycled action backtracking between G t+2k and G t+2k+1 . • Transduction. Existing methods for selecting edges for rewiring are transductive since they search the robust topology on a particular graph instance. This search procedure is performed for every individual graph without generalization. • Local optimality. It is NP-hard to combinatorially choose two edges to rewire to obtain the globally optimal resilience (Mosk-Aoyama, 2008) . Previous studies predominantly adopt greedy-like algorithms, yielding local optimality in practice (Chan & Akoglu, 2016) . • Utility Loss. Rewiring operation in network resilience optimization may lead to considerable degradation of the network utility, which may jeopardize the network's functioning. To our knowledge, there is no learning-based inductive network resilience optimization method. A key challenge is that many network science tasks, including resilience optimization, only have pure topologies without rich node features. GNNs-based learning paradigms have proved powerful in solving a large variety of graph tasks with rich features inductively (Li et al., 2018; Joshi et al., 2019; Fu et al., 2020; Khalil et al., 2017; Nazari et al., 2018; Peng et al., 2020; Yu et al., 2019) . However, it still remains opaque how to adapt such approaches to graph tasks with only topological structures available, especially to graph tasks requiring distinguishable node/edge representations to select correct nodes/edges for constructing a solution sequentially. For example, Boffa et al. (2022) showed that the performance of GNNs degenerates largely in solving the traveling salesman problem (TSP) when node coordinate features are missing. Similarly, we empirically found that the popular framework of GNNs and reinforcement learning (RL) fails to optimize network resilience. The RL agent would be stuck into the undesired infinite action backtracking without the meaningful edge representations, as shown in Figure 1 . Readers are referred to Appendix E for a more detailed analysis. Therefore, it is demanding to devise a novel GNN that can be applicable for the network resilience optimization task without rich features. In this work, we overcome the above limitation of GNNs in modeling graphs without rich features and develop the first inductive learning-based method for discovering resilient network topology using successive edge rewiring operations. Inspired by persistent homology and the approximation of the persistence diagram (Edelsbrunner & Harer, 2008; Aktas et al., 2019; Hofer et al., 2020; Horn et al., 2022) , we specially design a purely topology-oriented variant of GNN called Filtration enhanced GNN (FireGNN) . FireGNN creates a series of subgraphs (the filtration) by successively removing the node with the highest degree from the graph and then learns to aggregate node representations from each subgraph. FireGNN learns meaningful representations via the proposed filtration process. The main contributions of this paper are summarized as follows: 1) We propose the first learning-based method, ResiNet, to boost network resilience without rich node features inductively in a degree-preserving manner with moderate utility loss. ResiNet forms resilience optimization into a successive sequential decision process of neural edge rewirings. Extensive experiments show that ResiNet achieves near-optimal resilience while balancing network utilities and outperforms existing approaches by a large margin. 2) FireGNN, our technical innovation serving as the graph feature extractor, can learn meaningful representations from pure topological structures. FireGNN provides sufficient training signals to train an RL agent to learn successive edge rewiring operations inductively.

2. RELATED WORK

GNNs for graph-related tasks with rich features. GNNs are powerful tools to learn from relational data with rich features, providing meaningful representations for downstream tasks. Several successful applications using GNNs as backbones include node classification (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovic et al., 2018) , link prediction (Li et al., 2020a; Kipf & Welling, 2017; Hamilton et al., 2017) , graph property estimation (Xu et al., 2019; Kipf & Welling, 2017; Li et al., 2020a; Bodnar et al., 2021) , and combinatorial problems on graphs (e.g., TSP (Li et al., 2018; Joshi et al., 2019; Fu et al., 2020; Khalil et al., 2017; Hudson et al., 2022) , vehicle routing problem (Nazari et al., 2018; Peng et al., 2020; Yu et al., 2019) , graph matching (Yu et al., 2021) and adversarial attack on GNNs (Ma et al., 2021; Dai et al., 2018) ). However, till now, it remains unclear how to adapt GNNs to graph tasks without rich feature (Zhu et al., 2021) like the resilience optimization task that we focus on. Current topology-based GNNs like TOGL (Horn et al., 2022) still rely on distinct node features for calculating the filtration, while our proposed FireGNN addresses this by creating a temporal-related filtration and learning to aggregate them (Sec. 4.2). Graph rewiring Graph rewiring is typically used in the GNN community to build novel classes of GNNs by preprocessing a given graph to overcome the problems of the over-squashing issue of training GNNs. For example, Klicpera et al. (2019) developed graph diffusion convolution (GDC) to improve GNN's performance on downstream tasks by replacing message passing with graph diffusion convolution Topping et al. (2022) proposed an edge-based combinatorial curvature to help alleviate the over-squashing phenomenon in GNNs. To our knowledge, there is currently no inductive learning-based graph rewiring method, and graph rewiring methods rely on rich features to train GNNs better on downstream tasks. The edge rewiring operation used in our paper is a special graph rewiring operator that preserves node degree. Extended related work. The related work on network resilience, network utility, graph structure learning, multi-views graph augmentation for GNNs and deep graph generation is deferred to Appendix A.

3. PROBLEM DEFINITION

An undirected graph is defined as G = (V, E), where V = {1, 2, . . . , N } is the set of N nodes, E is the set of M edges, A ∈ {0, 1} N ×N is the adjacency matrix, and F ∈ R N ×d is the d-dimensional node feature matrixfoot_1 . The degree of a node is defined as d i = N j=1 A ij , and a node with degree 0 is called an isolated node. Let G G denote the set of graphs with the same node degrees as G. Given the network resilience metric R(G) (Schneider et al., 2011) and the utility metric U(G) (Latora & Marchiori, 2003; Boccaletti et al., 2006) , the objective of boosting the resilience of G is to find a target graph G ⋆ ∈ G G , which maximizes the network resilience while balancing the network utility. Formally, the problem of maximizing the resilience of complex networks is formulated as G ⋆ = arg max G ′ ∈G G α • R(G ′ ) + (1 -α) • U(G ′ ) . To satisfy the constraint of preserving degree, edge rewiring is the default atomic operation for obtaining new graphs G ′ from G. Combinatorially, a total of T successive steps of edge rewiring has the complexity of O(E 2T ). The definitions of resilience and utility metrics are deferred to Sec. 5.1.

4. PROPOSED APPROACH: RESINET

In this section, we formulate the task of boosting network resilience as a reinforcement learning task by learning to select two edges and rewire them successively. We first present the graph resilienceaware environment design and describe our innovation FireGNN in detail. Finally, we present the graph policy network that guides the edge selection and rewiring process. As shown in Figure 2 , the environment performs the resilience optimization in an auto-regressive step-wise way through a sequence of edge rewiring actions. Given an input graph, the agent first decides whether to terminate or not. If not, it selects one edge from the graph to remove, receives the very edge it just selected as the auto-regression signal, and then selects another edge to remove. Four nodes of these two removed edges are re-combined, forming two new edges to be added to the graph. The optimization process repeats until the agent decides to terminate. The detailed design of the state, the action, the transition dynamics, and the reward are presented as follows. State. The fully observable state is formulated as S t = G t , where G t is the current input graph at step t. The detailed node feature initialization strategy is given in Appendix C.3.

Action.

ResiNet is equipped with a node permutation-invariant, variable-dimensional action space. Given a graph G t , the action a t is to select two edges and the rewiring order. As is shown in Figure 3 , the agent first chooses an edge e 1 = AC and a direction A → C. Then conditioning on the state, e 1 , and the direction the agents chooses an edge e 2 = BD such that AB, CD, AD, BC / ∈ E and a direction B → D. The heads of the two edges reconnect as a new edge AB, and so does the tail CD. Although G t is undirected, we propose to consider the artificial edge directions, which effectively avoids the redundancy in representing action space since A → C, B → D and C → A, D → B refer to the same rewiring operation. Therefore, our proposed action space reduces the size of the original action space by half and still leads to a complete action space. In this way, the action space is the set of all feasible pairs of (e 1 , e 2 ) ∈ E 2 , with a variable size no larger than 2|E|(|E| -1). Transition dynamics. The formulation of the action space implies that if the agent does not terminate at step t, the selected action must form an edge rewiring. This edge rewiring is executed by the environment, and the graph transits to the new graph. Note that in some other work, infeasible operations are also included in the action space (to make the action space constant through the process) (You et al., 2018; Trivedi et al., 2020) . This reduces the sample efficiency and causes biased gradient estimations (Huang & Ontañón, 2020) . ResiNet takes advantage of the state-dependent variable action space composed of only feasible operations. Reward. ResiNet aims to optimize resilience while balancing the utility, forming a complicated and possibly unknown objective function. Despite this, by (Wakuta, 1995) , an MDP that maximizes a complicated objective is up to an MDP that maximizes the linear combination of resilience and utility for some coefficient factor. This fact motivates us to design the reward as the step-wise gain of such a linear combination as R t = α • (R(G t+1 ) -R(G t )) + (1 -α) • (U(G t+1 ) -U(G t )) , where R(G) and U(G) are the resilience and the utility functions, respectively. The cumulative reward T -1 t=0 R t up to time T is then the total gain of such a linear combination. Motivated by graph filtration in persistent homology (Edelsbrunner & Harer, 2008) , we design the filtrated graph enhanced GNN termed FireGNN to model graphs without rich features, or even with only topology. As shown in Figure 4 , for a given input graph G, FireGNN transforms G from the static version to a temporal version consisting of a sequence of subgraphs, by repeatedly removing the node with the highest degreefoot_2 . Observing a sequence of nested subgraphs of G grants FirGNN the capability to observe how G evolves towards being empty. Then FireGNN aligns and aggregates the node, edge, and graph embedding from each subgraph, leading to meaningful representations in node, edge, and graph levels. Formally, the filtration in FireGNN is constructed as

4.2. FIREGNN

G (k-1) = G (k) -v k , v k = argmax vi∈G (k) DEGREE(v i ) (V, ∅) = G (0) ⊂ G (1) ⊂ • • • ⊂ G (N ) = G G = [G (0) , G (1) , . . . , G (N ) ] , where G (k) denotes the remaining graph after removing N -k nodes with highest node degrees, v k denotes the node with highest degree in current subgraph G (k) , DEGREE(•) measures the node degree, G (N ) is the original graph, and G (0) contains no edge. The sequence of the nested subgraphs of G is termed the filtrated graph G. Node embedding. Regular GNN only operates on the original graph G to obtain the node embedding for each node v i as h(v i ) = ϕ(G (N ) = G) i , where ϕ(•) denotes a standard GNN model. In FireGNN, by using the top K + 1 subgraphs in a graph filtration, the final node embedding h(v i ) of v i is obtained by h(v i ) = AGG N h (N -K) (v i ), . . . , h (N -1) (v i ), h (N ) (v i ) , where AGG N (•) denotes a node-level aggregation function, h (k) (v i ) is the node embedding of i in the k-th subgraph G (k) , and K ∈ [N ]. In practice, h (k) (v i ) is discarded when calculating h(v i ) if v i is isolated or not included in G (k) . Edge embedding. The directed edge embedding h (k) (e ij ) of the edge from node i to node j in each subgraph is obtained by combining the embeddings of the two end vertices in G (k) as h (k) (e ij ) = m f AGG N →E h (k) (v i ), h (k) (v j ) , where AGG N →E (•) denotes an aggregation function for obtaining edge embedding from two end vertices (typically chosen from min, max, sum, difference, and multiplication). m f (•) is a multilayer perceptron (MLP) model that ensures the consistence between the dimensions of edge embedding and graph embedding. The final embedding of the directed edge e ij of the filtrated graph G is given by h(e ij ) = AGG E h (N -K) (e ij ), . . . , h (N -1) (e ij ), h (N ) (e ij ) , where AGG E (•) denotes an edge-level aggregation function. Graph embedding. With the node embedding k) is calculated by a readout functions (e.g., mean, sum) on all non-isolated nodes in G (k) as h (k) (v i ) of each subgraph G (k) available, the graph embedding h (k) (G) of each subgraph G ( h (k) (G) = READOUT h (k) (v i ) ∀v i ∈ G (k) and d (k) i ≥ 0 . The final graph embedding of the filtrated graph G is given by h(G) = AGG G h (N -K) (G), . . . , h (N -1) (G), h (N ) (G) , where AGG G (•) denotes a graph-level aggregation function.

4.3. EDGE REWIRING POLICY NETWORK

Having presented the details of the graph resilience environment and FireGNN, in this section, we describe the policy network architecture of ResiNet in detail, which learns to select two existing edges for rewiring at each step. At time step t, the policy network uses FireGNN as the graph extractor to obtain the directed edge embedding h(e ij ) ∈ R 2|E|×d and the graph embedding h(G) ∈ R d from the filtrated graph Gt , and outputs an action a t representing two selected rewired edges, leading to the new state G t+1 and the reward R t . To be inductive, we adapt a special autoregressive node permutation-invariant dimension-variable action space to model the selection of two edges from graphs with arbitrary sizes and permutations. The detailed mechanism of obtaining the action a t based on edge embedding and graph embedding is presented as follows, further reducing the complexity from O(T E 2 ) to O(T E). Auto-regressive latent edge selection. An edge rewiring action a t at time step t involves the prediction of the termination probability a (0) t and the selection of two edges (a (1) t and a (2) t ) and the rewiring order. The action space of a (0) t is binary, however, the selection of two edges imposes a huge action space in O(|E| 2 ), which is too expensive to sample from even for a small graph. Instead of selecting two edges simultaneously, we decompose the joint action a t into a t = (a (0) t , a (1) t , a (2) t ), where a (1) t and a (2) t are two existing edges which do not share any common node (recall that a (1) t and a (2) t are directed edges for an undirected graph). Thus the probability of a t is formulated as P(a t |s t ) = P(a (0) t |s t )P(a (1) t |s t , a (0) t )P(a (2) t |s t , a (0) t , a (1) t ) . Predicting the termination probability. The first policy network π 0 (•) takes the graph embedding as input and outputs the probability distribution of the first action that decides to terminate or not as P(a (0) t |s t ) = π 0 (h(G)) , where π 0 (•) is implemented by a two layer MLP. Then a (0) t ∼ Bernoulli(P(a (0) t |s t )) ∈ {0, 1}. Selecting edges. If the signal a (0) t given by the agent decides to continue to rewire, two edges are then selected in an auto-regressive way. The signal of continuing to rewire a (0) t is input to the selection of two edges as a one-hot encoding vector l c . The second policy network π 1 (•) takes the graph embedding and l c as input and outputs a latent vector l 1 ∈ R d . The pointer network (Vinyals et al., 2015) is used to measure the proximity between l 1 and each edge embedding h(e ij ) in G to obtain the first edge selection probability distribution. Then, to select the second edge, the graph embedding h(G) and the first selected edge embedding h(e (1) t ) and l c are concatenated and fed into the third policy network π 2 (•). π 2 (•) obtains the latent vector l 2 for selecting the second edge using a respective pointer network. The overall process can be formulated as l 1 = π 1 ([h(G), l c ]), P(a (1) t |s t , a (0) t ) = f 1 (l 1 , h(e ij )), ∀e ij ∈ E l 2 = π 2 ([h(G), h(e (1) t ), l c ]), P(a (2) t |s t , a (1) t , a (0) t ) = f 2 (l 2 , h(e ij )), ∀e ij ∈ E , where π i (•) is a two-layer MLP model, [•, •] denotes the concatenation operator, h(e (1) t ) is the embedding of the first selected edge at step t, and f i (•) is a pointer network.

5. EXPERIMENTS

In this section, we demonstrate the advantages of ResiNet over existing non-learning-based and learning-based methods in achieving superior network resilience, inductively generalizing to unseen graphs, and accommodating multiple resilience and utility metrics. Moreover, we show that FireGNN can learn meaningful representations from graph data without rich features, while current GNNs fail. Our implementation is already open sourced.

5.1. EXPERIMENTAL SETTINGS

Datasets. Synthetic datasets, real EU power network (Zhou & Bialek, 2005) and Internet peer-topeer networks (Leskovec et al., 2007; Ripeanu et al., 2002) are used to demonstrate the performance of ResiNet in transductive and inductive settings. The details of data generation and the statistics of the datasets are presented in Appendix C.1. Following the conventional experimental settings in network science, the maximal node size is set to be around 1000 (Schneider et al., 2011) , taking into account: 1) the high complexity of selecting two edges at each step is O(E 2 ); 2) evaluating the resilience metric is time-consuming for large graphs. Baselines. We compare ResiNet with existing graph resilience optimization algorithms, including learning-free and learning-based algorithms. Learning-free methods (upper half of Table 1 ) include the hill climbing (HC) (Schneider et al., 2011) , the greedy algorithm (Chan & Akoglu, 2016) , the simulated annealing (SA) (Buesser et al., 2011) , and the evolutionary algorithm (EA) (Zhou & Liu, 2014) . Since to our knowledge there is no previous learning-based baseline, we specifically devise five counterparts based on our method by replacing FireGNN with existing well-known powerful GNNs (DE-GNN (Li et al., 2020b) , k-GNN (Morris et al., 2019) , DIGL (Klicpera et al., 2019) , ADGN (Sun & Wu, 2020) and SDRF (Topping et al., 2022) ) (lower half of Table 1 ). The classical GIN model is used as the backbone (Xu et al., 2019) . The ResiNet's training setup is detailed in Appendix C.2. Metrics. Following the conventional setting in network science, resilience metrics used in our experiments include graph connectivity-based (Schneider et al., 2011) and spectrum-based measurements (adjacency matrix spectrum and Laplacian matrix spectrum). Utility metrics consist of global efficiency and local efficiency (Latora & Marchiori, 2003; Boccaletti et al., 2006) . The detailed definitions are deferred to Appendix B.

5.2. COMPARISONS TO THE BASELINES

In this section, we compare ResiNet to baselines in optimizing the combination of resilience and utility with weight coefficient α ∈ {0, 0.5}. Following conventional setting, the graph connectivitybased metric is used as resilience metric (Schneider et al., 2011) and the global efficiency is used as utility metric (Latora & Marchiori, 2003; Boccaletti et al., 2006) . Table 1 records the metric gain and the required number of rewiring operations of different methods under the same rewiring budget. ResiNet outperforms all baselines consistently on all datasets. Note that this performance may be achieved by ResiNet under a much fewer number of rewiring operations, such as on BA-15 with α = 0. In contrast, despite approximately searching for all possible new edges, the greedy algorithm is trapped in a local optimum (as it maximizes the one-step resilience gain) and is too expensive to optimize the resilience of a network with more than 300 nodes. For SA, the initial temperature and the temperature decay rate need to be carefully tuned for each network. EA performs suboptimally with a limited rewiring budget due to the numerous rewiring operations required in the internal process (e.g., the crossover operator). Learning-based methods using existing GNNs coupled with distance encoding cannot learn effectively compared to our proposed ResiNet, supporting our claim about the effectiveness of FireGNN on graphs without rich features.

5.3. ABLATION STUDY OF RESINET

In this section, we investigate the impact of coefficient α of the objective on ResiNet and the effect of the filtration order K on FireGNN. To investigate the impact of the α in the reward function on ResiNet, we run a grid search by varying α from 0 to 1 and summarize the resilience gain, utility gain, and the sum of them in Table 2 . Table 2 shows that when we only optimize the resilience with α = 0, the utility will degrade. Similarly, the resilience would also decrease if we only optimize the utility with α = 1. This suggests a general tradeoff between resilience and utility and is consistent with their definitions. However, despite this tradeoff, we can achieve resilience gain and utility gain simultaneously on BA-15 and BA-50 since the original graph usually does not have the maximum resilience or utility. This incentivizes almost every network conducts such optimization to some extent when feasible. In FireGNN, the filtration order K of FireGNN determines the total number of subgraphs involved in calculating the final node embedding, edge embedding, and graph embedding. FireGNN degenerates to existing GNNs when the filtration order K is 0. Table 1 validates the effectiveness and necessity of FireGNN. Without FireGNN (other GNNs as the backbone), it is generally challenging for ResiNet to find a positive gain on graphs without rich features since ResiNet cannot learn to select the correct edges with the incorrect edge embeddings. The maximum K of each dataset is recorded in Appendix Table 7 , which shows that the maximum K equals the around half size of the graph since we gradually remove the node with the largest degree, leading to a fast graph filtration process. For our experiments, we use the maximum of K for graphs of sizes less than 50 and set K = 3 (1) for graphs of sizes larger than 50 (200). To validate that ResiNet is not sensitive to K, we run a grid search on several datasets to optimize the resilience by setting K = 0, 1, 2, 3. As shown in Appendix Table 5, the resilience is improved significantly with K > 0 and ResiNet performs well with K = 1 or K = 2. 

6. GENERALIZATION

To demonstrate the induction of ResiNet, we first train ResiNet on two different datasets (BA-10-30 and BA-20-200) with the data setting listed in Appendix Table 3 , and then evaluate its performance on an individual test dataset. The test dataset is not observed during the training process and fine-tuning is not allowed. We report the averaged resilience gain for the graphs of the same size for each dataset. The performance of ResiNet on BA-10-30 is shown in Figure 5 and the results of other datasets are deferred to Figure 9 in Appendix D. Figure 5 shows a nearly linear improvement of resilience with the increase of graph size, which is also consistent with the results in the transductive setting that larger graphs usually have a larger room to improve their resilience. Moreover, we conduct experiments to demonstrate ResiNet's generalization on optimizing different utility and resilience metrics, and the details are deferred to Appendix D.

7. CONCLUSION

We have proposed a learning-based inductive method, ResiNet, for the discovery of resilient network topologies with minimal changes to the graph structure. ResiNet is the first inductive method that formulates the task of boosting network resilience as an MDP of successive edge rewiring operations. Our technical innovation, FireGNN, is motivated by persistent homology as the graph feature extractor for handling graphs with only topologies available. FireGNN alleviates the insufficiency of current GNNs (including GNNs more powerful than 1-WL test) on modeling graphs without rich features. By decomposing graphs into temporal subgraphs and learning to combine the individual representations from each subgraph, FireGNN can learn meaningful representations on the resilience task to provide sufficient gradients for training an RL agent to select correct edges while current GNNs fail due to the infinite action backtracking. Our method is practically feasible as it balances the utility of the networks when boosting resilience. FireGNN is potentially general enough to be applied to solve various graph problems without rich features.

APPENDIX A EXTENDED RELATED WORK

Network resilience. Modern network systems are threatened by various malicious attacks, such as the destruction of critical nodes, critical connections and critical subset of the network via heuristics/learning-based attack (Fan et al., 2020; Zhao et al., 2021; Zhang et al., 2017; Holme et al., 2002; Iyer et al., 2013; Grassia et al., 2021; Fan et al., 2020; Medya et al., 2020) . Network resilience was proposed and proved as a suitable measurement for describing the robustness and stability of a network system under such attacks (Schneider et al., 2011) . Around optimizing network resilience, various defense strategies have been proposed to protect the network functionality from crashing and preserve network's topologies to some extent. Commonly used manipulations of defense include adding additional edges (Li et al., 2019; Carchiolo et al., 2019) , protecting vulnerable edges (Wang et al., 2014) and rewiring two edges (Schneider et al., 2011; Chan & Akoglu, 2016; Buesser et al., 2011) . Among these manipulations, edge rewiring fits well to real-world applications as it induces fewer functionality changes to the original network and does not impose additional loads to the vertices (degree-preserving) (Schneider et al., 2011; Rong & Liu, 2018; Yazıcıoglu et al., 2015) . By now, there has been no learning-based inductive edge rewiring strategy for the resilience task. Network utility. Network utility refers to the system's quality to provide a specific service, for example, transmitting electricity in power networks and transmitting packages in routing networks. A popular metric for network utility is the network efficiency (Latora & Marchiori, 2003; Boccaletti et al., 2006) . In many previous work, despite that network resilience could be improved, the utility may dramatically drop at the same time (Li et al., 2019; Carchiolo et al., 2019; Wang et al., 2014; Schneider et al., 2011; Chan & Akoglu, 2016; Buesser et al., 2011) . This contradicts the idea behind improving network resilience and will be infeasible in real-world applications. Our goal is to enhance network resilience with moderate loss of network utility via edge rewiring. Graph structure learning. Unlike the graph generation task which focuses on the quality of the generated graph, graph structure learning (GSL) aims to jointly learn an optimized graph structure and corresponding graph representations only for better performance on downstream tasks. Although conceptually related, GSL differs from graph generation since GSL mostly cares for the downstream task while graph generation focuses on the generated graphs (Jin et al., 2020; Zhu et al., 2021) . Currently, GSL relies on the existence of rich features to construct a graph, while there are generally no rich features in graph generation. Moreover, GSL cannot control the graph's node degree during graph optimization. We refer the interested readers to a survey of GSL (Zhu et al., 2021) since our work is a constrained graph generation task unrelated to GSL. Multi-views graph augmentation for GNNs. Multi-views graph augmentation is one efficient way to improve the expressive power of GNNs or combine domain knowledge, which is adapted based on the task's prior (Hu et al., 2020) . For example, GCC generates multiple subgraphs from the same ego network (Qiu et al., 2020) . DGI maximizes the mutual information between global and local information (Velickovic et al., 2019) . GCA adaptively incorporates various priors for topological and semantic aspects of the graph (You et al., 2020) . (Hassani & Khasahmadi, 2020) contrasts representations from first-order neighbors and a graph diffusion. DeGNN Jin et al. (2020) was proposed as an automatic graph decomposition algorithm to improve the performance of deeper GNNs. These techniques rely on the existence of rich graph feature and the resultant GNNs cannot work well on graphs without rich features. In the resilience task, only the graph topological structure is available. Motivated by the calculation process of persistent homology (Edelsbrunner & Harer, 2008) , we apply the filtration process to enhance the expressive power of GNNs for handling graphs without rich features. Deep graph generation. Deep graph generation models learn the distribution of given graphs and generate more novel graphs. Some work use the encoder-decoder framework by learning latent representation of the input graph through the encoder and then generating the target graph through the decoder. For example, GCPN (You et al., 2018) incorporates chemistry domain rules on molecular graph generation. GT-GAN (Guo et al., 2018) proposes a GAN-based model on malware cybernetwork synthesis. GraphOpt (Trivedi et al., 2020 ) learns an implicit model to discover an underlying optimization mechanism of the graph generation using inverse reinforcement learning. GFlowNet learns a stochastic policy for generating molecules with the probability proportional to a given reward based on flow networks and local flow-matching conditions Bengio et al. (2021) . Boosting network resilience in a degree-preserving way can be viewed as a constrained graph generation task. However, constrained version of graph generation is still under development and none of existing methods can generate desired graphs with the exact node degree preserving constraint, which is required by the resilience task.

B DEFINITIONS OF DIFFERENT OBJECTIVE FUNCTIONS

In this section, we present resilience definitions and utility definitions used in our experiments.

B.1 RESILIENCE DEFINITIONS

Three kinds of resilience metrics are considered: • The graph connectivity-based measurement is defined as (Schneider et al., 2011 ) R(G) = 1 N N q=1 s(q) , where s(q) is the fraction of nodes in the largest connected remaining graph after removing q nodes from graph G according to certain attack strategy. The range of possible values of R is [1/N, 1/2], where these two extreme values correspond to a star network and a fully connected network, respectively. Figure 6 shows that the failures of a dozen of nodes could jeopardize the connectivity and utility of the EU power network. • The spectral radius (SR) denotes the largest eigenvalue λ 1 of an adjacency matrix. • The algebraic connectivity (AC) represents the second smallest eigenvalue of the Laplacian matrix of G.

B.2 UTILITY DEFINITIONS

In this paper, the global and local communication efficiency are used as two measurements of the network utility, which are widely applied across diverse applications of network science, such as transportation and communication networks (Latora & Marchiori, 2003; Boccaletti et al., 2006) . The average efficiency of a network G is defined as inversely proportional to the average over pairwise distances (Latora & Marchiori, 2001) as E(G) = 1 N (N -1) i̸ =j∈V 1 d(i, j) , where N denotes the total nodes in a network and d(i, j) is the length of the shortest path between a node i and another node j. We can calculate the global and local efficiency given the average efficiency. • The global efficiency of a network G is defined as (Latora & Marchiori, 2001; 2003 ) E global (G) = E(G) E(G ideal ) , where G ideal is the "ideal" fully-connected graph on N nodes and the range of E global (G) is [0, 1]. • The local efficiency of a network G measures a local average of pairwise communication efficiencies and is defined as (Latora & Marchiori, 2001 ) E local (G) = 1 N i∈V E(G i ) , where G i is the local subgraph including only of a node i's one-hop neighbors, but not the node i itself. The range of E local (G) is [0, 1].

C IMPLEMENTATION DETAILS OF RESINET

This section provides the implementation details of ResiNet, including dataset, network structure training strategies, and node feature construction.

C.1 DATASET

We first present the data generation strategies. Table 3 summarizes the statistics of each dataset. Synthetic datasets are generated using the Barabasi-Albert (BA) model (known as scale-free graphs) (Albert & Barabási, 2002) , with the graph size varying from |N |=10 to |N |=1000. During the data generation process, each node is connected to two existing nodes for graphs with no more than 500 nodes, and each node is connected to one existing node for graphs with near 1000 nodes. BA graphs are chosen since they are vulnerable to malicious attacks and are commonly used to test network resilience optimization algorithms (Bollobás & Riordan, 2004) . We test the performance of ResiNet on both transductive and inductive settings. • Transductive setting. The algorithm is trained and tested on the same network. -Randomly generated synthetic BA networks, denoted by BA-m, are adopted to test the performance of ResiNet on networks of various sizes, where m ∈ {15, 50, 100, 500, 1000} is the graph size. -The Gnutella peer-to-peer network file sharing network from August 2002 (Leskovec et al., 2007; Ripeanu et al., 2002) and the real EU power network (Zhou & Bialek, 2005) are used to validate the performance of ResiNet on real networks. The random walk sampling strategy is used to derive a representative sample subgraph with hundreds of nodes from the Gnutella peer-to-peer network (Leskovec & Faloutsos, 2006) . • Inductive setting. Two groups of synthetic BA networks denoted by BA-m-n are randomly generated to test ResiNet's inductivity, where m is the minimal graph size, and n indicates the maximal graph size. We first randomly generate the fixed number of BA networks as the training data to train ResiNet and then evaluate ResiNet's performance directly on the test dataset without any additional optimization.

C.2 RESINET SETUP

In this section, we provide detailed parameter setting and training strategies for ResiNet. Our proposed FireGNN is used as the graph encoder in ResiNet, including a 5-layer defined GIN (Xu et al., 2019) as the backbone. The hidden dimensions for node embedding and graph embedding in Table 3 : Statistics of graphs used for resilience maximization. Both transductive and inductive settings (⋆) are included. Consistent with our implementation, we report the number of edges by transforming undirected graphs to directed graphs. The edge rewiring has a fixed execution order. For the inductive setting, we report the maximum number of edges. The action space size of the edge rewiring is measured by 2|E| (Klambauer et al., 2017) is used after each message passing propagate. Graph normalization strategy is adopted to stabilize the training of GNN (Cai et al., 2021) . The jumping knowledge network (Xu et al., 2018) is used to aggregate node features from different layers of the GNN. The overall policy is trained by using the highly tuned implementation of proximal policy optimization (PPO) algorithm (Schulman et al., 2017) . Several critical strategies for stabilizing and accelerating the training of ResiNet are used, including advantage normalization (Andrychowicz et al., 2021) , the dual-clip PPO (the dual clip parameter is set to 10) (Ye et al., 2020) , and the usage of different optimizers for policy network and value network. Additionally, since the step-wise reward range is small (around 0.01), we scale the reward by a factor of 10 to aim the training of ResiNet. The policy head model and value function model use two separated FireGNN encoder networks with the same architecture. ResiNet is trained using two separate Adam optimizers (Kingma & Ba, 2015) with batch size 256 and a linearly decayed learning rate of 0.0007 for the policy network and a linearly decayed learning rate of 0.0007 for the value network. The aggregation function of FireGNN is defined as an attention mechanism-based linear weighted combination. Hardware: We run all experiments for ResiNet on the platform with two GEFORCE RTX 3090 GPU and one AMD 3990X CPU.

C.3 NODE FEATURE CONSTRUCTION

The widely-used node degree feature cannot significantly benefit the network resilience optimization of a single graph due to the degree-preserving rewiring. Therefore, we construct node features for each input graph to aid the transductive learning and inductive learning, including • The distance encoding strategy (Li et al., 2020b) . Node degree feature is a part of it. • The 8-dimensional position embedding originating from the Transformer (Vaswani et al., 2017) as the measurement of the vulnerability of each node under attack. If the attack order is available, we can directly encode it into the position embedding. If the attack order is unknown, node degree, node betweenness, and other node priority metrics can be used for approximating the node importance in practice. In our experiments, we used the adaptive node degree for the position embedding.

C.4 BASELINE SETUP

All baselines share the same action space with ResiNet and use the same action masking strategy to block invalid actions as ResiNet does. The maximal objective evaluation is consistent for all algorithms. Other settings of baselines are consistent with the default values in their paper. The early-stopping strategy is used for baselines, which means that the search process terminates if no 

D EXTENDED EXPERIMENTAL RESULTS

In this section, we present additional experimental results to show that ResiNet generalizes to unseen graphs, different utility and resilience metrics. Not surprisingly, the optimized network with an improvement of about 3.6% for defending the betweenness-based attack also has a higher resilience (around 7.8%) against the node-degree attack. This may be explained as the similarity between node degree and betweenness for a small network.

D.2 INDUCTIVITY ON LARGER DATASETS

To demonstrate that ResiNet can learn from networks to accommodate different utility and resilience metrics, we conduct experiments based on BA-15 using multiple resilience and utility metrics. The Pareto points shown in Figure 7 denote the optimum under different objectives on BA-15, implying that ResiNet can obtain the approximate Pareto frontier. Surprisingly, the initial gain of resilience (from around 0.21 to around 0.24) is obtained without loss of the utility, which incentivizes almost every network to conduct such optimization to some extent when feasible. More results are included in Appendix D.1 and the optimized network structures are visualized in Figure 8 and Figure 11 . Even with limited computational resources, armed with the autoregressive action space and the power of FireGNN, ResiNet can be trained fully end-to-end on graphs with thousands of nodes using RL. (f) SRD + E global (g) ACD + E global (h) RD + E local (i) SRD + E local (j) ACD + E local (k) RB (l) SRB (m) ACB (n) RB + E global (o) SR B +E global (p) ACB + E global (q) RB + E local (r) SRB+E local (s) ACB + E local We demonstrate the inductivity of ResiNet on graphs of different sizes by training ResiNet on the BA-20-200 dataset, which consists of graphs with the size ranging from 20 to 200, and then report its performance on directly guiding the edges selections on unseen test graphs. The filtration order K is set to 1 for the computational limitation. As shown in Figure 9 , we can see that ResiNet has the best performance for N ∈ [70, 100]. The degrading performance with the graph size may be explained by the fact that larger graphs require a larger filtration order for ResiNet to work well. A more stable performance improvement of ResiNet is observed with the increment of graph size when trained to optimize network resilience and utility simultaneously, and ResiNet possibly finds a strategy to balance these two metrics.

D.2.1 THE EFFECT OF FILTRATION ORDER K ON FIREGNN

In this section, we report the ratio of the remaining edges in subgraphs versus different filtration order K in Table 6 and visualize it in Figure 10 . The maximum filtration order K of FireGNN of each dataset is summarized in Table 7 . Table 7 demonstrates that the maximum filtration order K is nearly around the half size of the graph since we gradually remove the node with the largest degree in the filtration process.

D.3 INSPECTION OF OPTIMIZED NETWORKS

Moreover, to provide a deeper inspection into the optimized network structure, we take the EU power network as an example to visualize its network structure and the optimized networks given by ResiNet with different objectives. Compared to the original EU network, Figure 11 (b) is the network structure obtained by only optimizing the graph connectivity-based resilience. We can observe a more crowded region on the left, consistent with the "onion-like" structure concluded in previous studies. If we consider the combination gain of both resilience and utility, we observe a more compact clustering "crescent moon"-like structure as shown in Figure 11 (c ). In this section, we present the resilience improvement and the required number of edge rewiring of each algorithm under a large rewiring budget of 20. The running speed is also presented to compare the running time efficiency of each algorithm. As shown in Table 8 , traditional methods improve the network resilience significantly compared to ResiNet under a large rewiring budget of 200. However, traditional methods are still undesired in such a case since a solution with a large rewiring budget is not applicable in practice due to the vast cost of adding many new edges into a real system. For example, the actual number of rewiring budget for EA is hard to calculate since it is a population-based algorithm, so it is omitted in Table 8 . All baselines adopt the earlystopping strategy that they will terminate if there is no positive resilience gain in a successive 1000 steps. No rich feature means that the output of the GNNs is not distinguishable, and then it is difficult for the RL agent to distinguish different vertices/edges, causing large randomness in the output of the policy. This may cause the rewiring process to alternate between two graphs, forming an infinite loop. And we suspect that this infinite loop failure may explain the poor empirical performance of optimizing network resilience by selecting edges using existing GNNs and reinforcement learning (RL). The infinite loop failure is presented as follows. Consider the graph G t with N nodes and containing two edges AB and CD. The agent selects AB and CD for rewiring, leading to G t+1 with news edges AC and BD. A frequent empirical failure of regular GNNs for the robustness task is the infinite action backtracking phenomenon. The agent would select AC and BD at step t + 1, returning back to G t and forming a cycled loop between G t and G t+1 . Formally, the infinite loop is formulated as where SIM is a similarity metric, h i t and h Gt are embeddings of node i and graph G t at step t, and (A,B) is one edge. ((A, Table 10 compares and summarizes different graph related tasks' characteristics. We can see that the resilience task is more challenging from many aspects. No prior rule like action masking or negative penalty can be used to avoid selecting visited nodes as in TSP. For the resilience task, all previously visited edges are also possibly valid to be selected again, resulting in insufficient training signals. The desired GNN model should not depend on rules like action masking to distinguish edge and graph representations for graphs with little node features. Our proposed FireGNN fulfills these requirements to obtain proper training signals. FireGNN has a distinct expressive power and learns to create more meaningful and distinguishable features for each edge. FireGNN is not a simple aggregation of higher-order information of a static graph. It was inspired by homology filtration and the multi-view graph augmentation. Persistence homology motivates us to aggregate more distinct node features by observing how the graph evolves towards being empty, leading to more distinct and meaningful features for each node/edge, thus avoiding the infinite loop. Extensive experimental results in Table 1 validate the necessity and effectiveness of FireGNN. Existing GNNs perform worse while FireGNN performs well.



In this paper, network resilience and network resilience are used interchangeably. For a graph with pure topology structure, node feature matrix is not available. Removing the node with highest degree leads to an efficient minimal number of resultant subgraphs, comparing to the uniformly random removal of nodes.



Figure 2: Overview of the architecture of ResiNet to select two edges for edge rewiring.

Figure 3: The edge rewiring operation with the removal of AC, BD and the addition of AB, CD.

Figure 4: Filtration process in FireGNN.

Figure 5: The inductive ability of ResiNet on the test dataset (BA-10-30) when optimizing (a) network resilience, (b) network utility, and (c) their combination.

Figure 6: The EU power network under the adaptive degree-based attack with (a) original EU network with 217 nodes, (b) remaining EU network after a series of attacks on 40 nodes, and (c) normalized size of the largest connected component (LCC).

Figure 7: Pareto points obtained by ResiNet of balancing various combinations of resilience and utility.

Figure7shows that ResiNet can accommodate various of α to balance the resilience and utility during the optimization process. As shown in Figure8, we conduct extensive experiments on the BA-15 network to demonstrate that ResiNet can learn to optimize graphs with different resilience and utility metrics and to defend against other types of attacks besides the node degree-based attack, such as the node betweenness-based attack.Table4records the improvements in percentage of ResiNet for varying objectives on the BA-15 dataset. As visualized in Figure8, ResiNet is not limited to defend against the node degree-based attack (Figure8(b)-(j)) and also learns to defend against the betweenness-based attack (Figure8 (k)-(s)). Total three resilience metrics are used, with R denoting the graph connectivity-based resilience metric, SR being the spectral radius and SR representing the algebraic connectivity. Total two utility metrics are adopted, including the global efficiency E global and the local efficiency E local . Not surprisingly, the optimized network with an improvement of about 3.6% for defending the betweenness-based attack also has a higher resilience (around 7.8%) against the node-degree attack. This may be explained as the similarity between node degree and betweenness for a small network.

Figure 8: The resilience maximization on the BA-15 dataset with nodes and 27 edges with (a) original network, (b)-(j) results of defending the node degree-based attack with different combinations of resilience and utility, and (k)-(s) results of defending against the node betweenness-based attack with varying combinations of resilience and utility. For three resilience metrics, R denotes the graph connectivity-based resilience metric; SR is the spectral radius; SR represents the algebraic connectivity. For two utility metrics, E global denotes the global efficiency, and E local is the local efficiency.

Figure 9: The inductive ability of ResiNet on the test dataset (BA-20-200) when optimizing (a) network resilience and (b) the combination of resilience and utility.

Figure 11: Visualizations of the original EU network and optimized networks using ResiNet with different objectives: R means the connectivity-based resilience measurement and E global is the global efficiency.

Figure 10: Ratio of the remaining edges in subgraphs versus different filtration order K

Resilience optimization algorithm under the fixed maximal rewiring number budget of 20. Entries are in the format of X(Y ), where 1) X: weighted sum of the graph connectivity-based resilience and the network efficiency improvement (in percentage); 2) Y : required rewiring number.

The effect of the coefficient α on ResiNet. The result is shown as percentages.

2 .

Performance gain (in percentage) of ResiNet in optimizing varying objectives on the BA-15 network. All objectives are optimized with the same hyper-parameters, which means that we did not tune hyper-parameters for objectives except for R D .

The effect of the filtration order K on ResiNet in improving network resilience (percentage).

The ratio of remaining edges in subgraphs versus the filtration order K.

Maximum filtration order K of each dataset.

Resilience optimization algorithm under the fixed maximal rewiring number budget of 200. Entries are in the format of X(Y ), where 1) X: weighted sum of the graph connectivity-based resilience and the network efficiency improvement (in percentage); 2) Y : required rewiring number. Results are averaged over 3 runs and best performance is in bold.



Running speed (in second) of the resilience optimization algorithm under the fixed maximal rewiring number budget. Entries are in the format of X(Y ), where 1) X: speed under the budget of 20; 2) Y : speed under the budget of 200 . ✗ means that the result is not available at a reasonable time. Results are averaged over 3 runs and best performance is in bold.

Characteristics of different graph related tasks. DEEP ANALYSIS OF WHY REGULAR GNNS FAIL IN THE RESILIENCE TASKIt is well-known that GNNs generally work well for graphs with rich features. Unluckily, the graph network in the resilience task has no node/edge/graph feature, with only the topological structure available.

B), (C, D)) = argmax

