CHASING ALL-ROUND GRAPH REPRESENTATION RO-BUSTNESS: MODEL, TRAINING, AND OPTIMIZATION

Abstract

Graph Neural Networks (GNNs) have achieved state-of-the-art results on a variety of graph learning tasks, however, it has been demonstrated that they are vulnerable to adversarial attacks, raising serious security concerns. A lot of studies have been developed to train GNNs in a noisy environment and increase their robustness against adversarial attacks. However, existing methods have not uncovered a principled difficulty: the convoluted mixture distribution between clean and attacked data samples, which leads to sub-optimal model design and limits their frameworks' robustness. In this work, we first begin by identifying the root cause of mixture distribution, then, for tackling it, we propose a novel method GAME -Graph Adversarial Mixture of Experts to enlarge model capacity and enrich the representation diversity of adversarial samples, from three perspectives of model, training, and optimization. Specifically, we first propose a plug-and-play GAME layer that can be easily incorporated into any GNNs and enhance their adversarial learning capabilities. Second, we design a decoupling-based graph adversarial training in which the component of the model used to generate adversarial graphs is separated from the component used to update weights. Third, we introduce a graph diversity regularization that enables the model to learn diverse representation and further improves model performance. Extensive experiments demonstrate the effectiveness and advantages of GAME over the state-of-the-art adversarial training methods across various datasets given different attacks.

1. INTRODUCTION

Graph neural networks (GNNs) have been demonstrated to be effective at learning from graphs. They explore a message-passing mechanism to update node representations by iteratively aggregating information from their neighbors, allowing GNNs to achieve state-of-the-art performance (Kipf & Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) . Many real-world applications are based on GNNs, such as modeling over social networks (Fan et al., 2022; Zhang et al., 2019; Hu et al., 2020) , scene graph reasoning (Chen et al., 2020; Zhang et al., 2022) , and biological molecules (Jin et al., 2018; Xu et al., 2019; Guo et al., 2022) . Nevertheless, despite their outstanding performance, GNNs are susceptible to perturbations (Zügner et al., 2018b; Zügner & Günnemann, 2019; Zheng et al., 2021; Yue et al., 2022) , which necessitate techniques to leverage GNN's robustness against adversarial attacks. Attackers can downgrade the performance of GNNs from multiple perspectives, such as adding or removing edges (Geisler et al., 2021; Chen et al., 2023) , perturbing node properties (Zügner & Günnemann, 2019; Sun et al., 2020; Tian et al., 2023) , and injecting malicious nodes (Zou et al., 2021; Ju et al., 2023) . To enhance GNN's robustness, multiple defense methods against graph attacks have been proposed (Jin et al., 2020; Entezari et al., 2020; Zhang & Zitnik, 2020) . However, most existing methods have not uncovered the principled difficulty (i.e., the convoluted mixture distribution between clean and attacked data samples), which results in sub-optimal model design, poor robustness, and limited performance. In light of this, we study the robustness of GNNs from a more fundamental perspective by discovering the key pattern behind the adversarial attacks that jeopardizes the performance of GNNs. We begin by comparing the statistical differentiation between the latent representations of nodes on the clean graph and the adversarially generated graph, as shown in Figure 1 . We observe that the distributions of node representations for clean and adversarial graphs before the message passing are highly similar (i.e., Figure 1 (a)). However, as the model gets deeper, these two distributions get increasingly distinct, as demonstrated by the progressively larger shift shown from Figure 1 (a) to (c). This demonstrates that adversarial attacks imperil GNN's performance by generating adversarial graphs belonging to a distribution different from the clean graph, and the GNN model fails to transfer the knowledge learned from the clean graph to the generated adversarial graph. To address the above challenge, we propose Graph Adversarial Mixture of Experts (GAME), a novel framework that enhances the robustness for GNN by expanding the model capacity and increasing the representation diversity for adversarial graphs. Specifically, we design GAME from three perspectives: (i) To strengthen the model capacity, we propose a plug-and-play GAME layer to accommodate the adversarial graphs with diverse mixture distributions by dynamically routing multiple assembled expert networks. (ii) From the training perspective, we present a decoupling graph adversarial training strategy, namely DECOG, where each expert network is trained by adversarial graphs generated by maximizing the gradient of other experts. DECOG enforces each expert to learn distinct distributions that all other experts under-perform at. (iii) From the optimization perspective, we incorporate a graph diversity regularization (GRADIV) to further enhance the diversity of knowledge learned via all expert networks such that GAME is capable of handling various adversarial graphs. GAME is an all-round robust framework that not only improves GNN's resilience to adversarial attacks, but also without too much extra cost compared with normal GNN, since GAME dynamically activates only one subset of the experts to participate in the computation. The contributions of this paper can be summarized as follows: • To the best of our knowledge, this is the first work to improve GNN's robustness from the perspective of distribution differentiation. According to our empirical studies, existing GNNs fail to transfer the knowledge learned from one clean graph's distribution to another generated adversarial counterpart, which results in vulnerabilities to adversarial attacks. • To solve this challenge, we propose an all-round framework, namely Graph Adversarial Mixture of Experts (GAME), from the perspectives of model design (i.e., GAME layer to bolster the model capacity), training (i.e., DECOG to diversify the adversarial graphs), and optimization (i.e., GRADIV to further diversify the experts' knowledge). • Comprehensive experiments are performed on multiple benchmark datasets across varying scales, demonstrating that the robustness contributed to our proposed all-around GAME. The suggested method beats other common baselines in a variety of attack evaluations and natural evaluations, demonstrating that the all-around robust design of GAME handles intricate mixture distribution well and cleverly addresses a fundamental difficulty in graph adversarial training.

2. RELATED WORK

Graph Neural Networks. Graph Neural Networks have recently attracted a great deal of interest due to their effectiveness in learning non-Euclidean data and their remarkable performance in a vast array of graph mining tasks (Hamilton et al., 2017; Battaglia et al., 2018; Wu et al., 2020) . Graph convolutional network (GCN) is proposed in the early stage of GNN research to apply the concept of convolution from image to graph data (Kipf & Welling, 2017; Gao et al., 2018; Wu et al., 2019a) . Instead of simply averaging the features of neighboring nodes, graph attention networks (Veličković et al., 2018; Wang et al., 2019) use the attention module to value each neighboring node and learn



Figure1: The distributions of node representations generated by two GNNs trained over clean and adversarial graphs. In (a), these two distributions are extremely similar. In (b) and (c), as the model gets deeper, a progressively larger differentiation between the two distributions is observed.

