ASGNN: GRAPH NEURAL NETWORKS WITH ADAPTIVE STRUCTURE

Abstract

The graph neural network (GNN) models have presented impressive achievements in numerous machine learning tasks. However, many existing GNN models are shown to be vulnerable to adversarial attacks, which creates a stringent need to build robust GNN architectures. In this work, we propose a novel interpretable message passing scheme with adaptive structure (ASMP) to defend against adversarial attacks on graph structure. Layers in ASMP are derived based on optimization steps that minimize an objective function that learns the node feature and the graph structure simultaneously. ASMP is adaptive in the sense that the message passing process in different layers is able to be carried out over dynamically adjusted graphs. Such property allows more fine-grained handling of the noisy (or perturbed) graph structure and hence improves the robustness. Convergence properties of the ASMP scheme are theoretically established. Integrating ASMP with neural networks can lead to a new family of GNN models with adaptive structure (ASGNN). Extensive experiments on semi-supervised node classification tasks demonstrate that the proposed ASGNN outperforms the state-of-the-art GNN architectures in terms of classification performance under various adversarial attacks.

1. INTRODUCTION

Graphs, or networks, are ubiquitous data structures in many fields of science and engineering (Newman, 2018) , like molecular biology, computer vision, social science, financial technology, etc. In the past few years, due to its appealing capability of learning representations through message passing over the graph structure, graph neural network (GNN) models have become popular choices for processing graph-structured data and have achieved astonishing success in various applications (Kipf and Welling, 2017; Bronstein et al., 2017; Wu et al., 2020; Zhou et al., 2020; Wu et al., 2022) . However, existing GNN backbones such as the graph convolutional network (GCN) (Kipf and Welling, 2017) and the graph attention network (Veličković et al., 2018) are shown to be extremely vulnerable to carefully designed adversarial attacks on the graph structure (Sun et al., 2018; Jin et al., 2021; Günnemann, 2022) . With unnoticeable malicious manipulations of the graph, the performance of GNNs significantly drops and may even be worse than the performance of a simple baseline that ignores all the relational information among data feature (Dai et al., 2018; Zügner et al., 2018; Zügner and Günnemann, 2019; Zhang and Zitnik, 2020) . With the increasing deployments of GNN models in various real-world applications, it is of vital importance to ensure their reliability and robustness, especially in scenarios, such as medical diagnosis and credit scoring, where a deflected model can lead to dramatic consequences (Günnemann, 2022). To improve the robustness of GNNs with a potentially noisy graph structure input, a natural idea is to "purify" the given graph structure. Existing work in this line can be roughly classified into two categories. The first category of robustifying GNNs can be viewed as a two-stage approach (Wu et al., 2019b; Entezari et al., 2020; Gu et al., 2021) . A purified graph is firstly obtained by "pre-processing" the input graph structure leveraging on information from the node feature. Next, a GNN model is trained based on this purified graph. For example, in the GNN-Jaccard method (Wu et al., 2019b) , a new graph is obtained by removing the edges with small "Jaccard similarity." In Entezari et al. (2020) , observing that adversarial attacks can scale up the rank of the graph adjacency matrix, the authors propose to use a low-rank approximation version of the given graph adjacency matrix as a substitute. In the second category, the graph adjacency matrix in a GNN model is treated as an unknown, a purified graph structure with a parameterized form will be "learned" through optimizing the supervised GNN training loss (Zhu et al., 2022) . For example, in Franceschi et al. ( 2019), the graph adjacency matrix is directly learned with a GNN in a bilevel optimization way, where a full parametrization of the graph adjacency matrix is adopted. Moreover, under this full parametrization setting, structural regularizers are adopted in Jin et al. (2020); Luo et al. (2021) as augmentations on the training loss function to promote certain properties of the purified graph. Besides the full parametrization approach, a multi-head weighted cosine similarity metric function (Chen et al., 2020) and a GNN model (Yu et al., 2020) have also been used to parameterize the graph adjacency matrix for structure learning. Going beyond purifying the graph structures to robustify the GNN models, there are also efforts on designing robust GNN architectures via directly designing the feature aggregation schemes. Under the observation that aggregation functions such as sum, weighted mean, or the max operations can be arbitrarily distorted by only a single outlier node, Geisler et al. ( 2020 2020) use clean graph information and their adversarial counterparts to train an attention mechanism to learn to assign small attention scores to the perturbed edges. In Zhang and Zitnik (2020), the authors define an attention mechanism based on the similarity of neighboring nodes. Different from existing approaches to robustify GNNs, in this work, we propose a novel robust and interpretable message passing scheme with adaptive structure (ASMP). Based on ASMP, a family of GNN models with adaptive structure (ASGNN) can be designed. Prior works have revealed that the message passing processes in a class of GNNs are actually (unrolled) gradient steps for solving a graph signal denoising (GSD) problem (Zhu et al., 2021; Ma et al., 2021; Zhang and Zhao, 2022) . ASMP is actually generated by an alternating (proximal) gradient descent algorithm for simultaneously denoising the graph signal and the graph structure. Designed in such a principled way, ASMP is not only friendly to back-propagation training but also achieves the desired structure adaptivity with a theoretical convergence guarantee. Once trained, ASMP can be naturally interpreted as a parameter-optimized iterative algorithm. This work falls into the category of GNN architecture designs. Conceptually different from the existing robustified GNNs with fixed graph structure, ASGNN interweaves the graph purification process and the message passing process, which makes it possible to conduct message passing over different graph structures at different layers, i.e., in an adaptive graph structure fashion. Thus, an edge might be excluded in some layers but included in other layers, depending on the dynamic structure learning process. Such property allows more fine-grained handling of perturbations than existing graph purification methods that use a single graph in the entire GNN. To be more specific, the major contributions of this work are highlighted in the following. • We propose a novel message passing scheme over graphs called ASMP with convergence guarantee and specifications. To the best of our knowledge, ASMP is the first message passing scheme with adaptive structure that is designed based on an optimization problem. • Based on ASMP, a family of GNN models with adaptive structure, named ASGNN, are further introduced. The adaptive structure in ASGNN allows more fine-grained handling of noisy graph structures and strengthens the model robustness against adversarial attacks. • Extensive experiments under various adversarial attack scenarios showcase the superiority of the proposed ASGNN. The numerical results corroborate that the adaptive structure property inherited in ASGNN can help mitigate the impact of perturbed graph structure.

2. PRELIMINARIES AND BACKGROUND

An unweighted graph with self-loops is denoted as G = (V, E), where V and E denote the node set and the edge set, respectively. The graph adjacency matrix is given by A ∈ R N ×N . We denote by 1 and I the all-one column vector and the identity matrix, respectively. Given D = Diag (A1) ∈ R N ×N as the diagonal degree matrix, the Laplacian matrix is defined as L = D -A. We denote by A rw = D -1 A the random walk (or row-wise) normalized adjacency matrix and by A sym = D -1 2 AD -1 2 the symmetric normalized adjacency matrix. Subsequently, the random walk normalized and symmetric



); Wang et al. (2020); Zhang and Lu (2020) try to design robust GNN models via designing robust aggregation functions. Moreover, some works apply the attention mechanism (Veličković et al., 2018) to mitigate the influence of adversarial perturbations. For example, Zhu et al. (2019) consider the node feature following a Gaussian distribution and use the variance information to determine the attention scores. Tang et al. (

