FLAG: ADVERSARIAL DATA AUGMENTATION FOR GRAPH NEURAL NETWORKS

Abstract

Data augmentation helps neural networks generalize better, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks). While most existing graph regularizers focus on augmenting graph topological structures by adding/removing edges, we offer a novel direction to augment in the input node feature space for better performance. We propose a simple but effective solution, FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training, and boosts performance at test time. Empirically, FLAG can be easily implemented with a dozen lines of code and is flexible enough to function with any GNN backbone, on a wide variety of large-scale datasets, and in both transductive and inductive settings. Without modifying a model's architecture or training setup, FLAG yields a consistent and salient performance boost across both node and graph classification tasks. Using FLAG, we reach state-of-the-art performance on the large-scale ogbg-molpcba, ogbg-ppa, and ogbg-code datasets.

1. INTRODUCTION

Graph Neural Networks (GNNs) have emerged as powerful architectures for learning and analyzing graph representations. The Graph Convolutional Network (GCN) (Kipf & Welling, 2016 ) and its variants have been applied to a wide range of tasks, including visual recognition (Zhao et al., 2019; Shen et al., 2018 ), meta-learning (Garcia & Bruna, 2017) , social analysis (Qiu et al., 2018; Li & Goldwasser, 2019) , and recommender systems (Ying et al., 2018) . However, the training of GNNs on large-scale datasets usually suffers from overfitting, and realistic graph datasets often involve a high volume of out-of-distribution test nodes (Hu et al., 2020) , posing significant challenges for prediction problems. One promising solution to combat overfitting in deep neural networks is data augmentation (Krizhevsky et al., 2012) , which is commonplace in computer vision tasks. Data augmentations apply label-preserving transformations to images, such as translations and reflections. As a result, data augmentation effectively enlarges the training set while incurring negligible computational overhead. However, it remains an open problem how to effectively generalize the notion of data augmentation to GNNs. Transformations on images rely heavily on image structures, and it is challenging to design low-cost transformations that preserve semantic meaning for non-visual tasks like natural language processing (Wei & Zou, 2019) and graph learning. Generally speaking, graph data for machine learning comes with graph structure (or edge features) and node features. In the limited cases where data augmentation can be done on graphs, it generally focuses exclusively on the graph structure by adding/removing edges (Rong et al., 2019) . To date, there is no study on how to manipulate graphs in node feature space for enhanced performance. In the meantime, adversarial data augmentation, which happens in the input feature space, is known to boost neural network robustness and promote resistance to adversarially chosen inputs (Goodfellow et al., 2014; Madry et al., 2017) . Despite the wide belief that adversarial training harms standard generalization and leads to worse accuracy (Tsipras et al., 2018; Balaji et al., 2019) , recently a growing amount of attention has been paid to using adversarial perturbations to augment datasets and ultimately alleviate overfitting. For example, Volpi et al. (2018) showed adversarial data augmentation is a data-dependent regularization that could help generalize to out-of-distribution samples, and

annex

its effectiveness has been verified in domains including computer vision (Xie et al., 2020) , language understanding (Zhu et al., 2019; Jiang et al., 2019) , and visual question answering (Gan et al., 2020) . Despite the rich literature about adversarial training of GNNs for security purposes (Zügner et al., 2018; Dai et al., 2018; Bojchevski & Günnemann, 2019; Zhang & Zitnik, 2020) , it remains unclear how to effectively and efficiently improve GNN's clean accuracy using adversarial augmentation.Present work. We propose FLAG, Free Large-scale Adversarial Augmentation on Graphs, to tackle the overfitting problem. While existing literature focuses on modifying graph structures to augment datasets, FLAG works purely in the node feature space by adding gradient-based adversarial perturbations to the input node features with graph structures unchanged. FLAG leverages "free" methods (Shafahi et al., 2019) to conduct efficient adversarial training so that it is highly scalable on large-scale datasets. We verify the effectiveness of FLAG on the Open Graph Benchmark (OGB) (Hu et al., 2020) , which is a collection of large-scale, realistic, and diverse graph datasets for both node and graph property prediction tasks. We conduct extensive experiments across OGB datasets by applying FLAG to prestigious GNN models, which are GCN, GraphSAGE, GAT, and GIN (Kipf & Welling, 2016; Hamilton et al., 2017; Veličković et al., 2017; Xu et al., 2019) and show that FLAG brings consistent and significant improvements. For example, FLAG lifts the test accuracy of GAT on ogbn-products by an absolute value of 2.31%. DeeperGCN (Li et al., 2020 ) is another strong baseline that achieves top performance on several OGB benchmarks. FLAG enables DeeperGCN to generalize further and reach new state-of-the-art performance on ogbg-molpcba and ogbg-ppa. FLAG is simple (adding just a dozen lines of code), general (can be directly applied to any GNN model), versatile (works in both transductive and inductive settings), and efficient (able to bring salient improvement at tractable or even no extra cost). Our main contributions are summarized as follows:• We propose adversarial perturbations as a data augmentation in the input node feature space to efficiently boost GNN performance. The resulting FLAG framework is a scalable and flexible augmentation scheme for GNN, which is easy to implement and applicable to any GNN architecture for both node and graph classification tasks. • We advance the state-of-the-art on a number of large-scale OGB datasets, often by large margins. • We provide a detailed analysis and deep insights on the effects adversarial augmentation has on GNNs.

2. PRELIMINARIES

Graph Neural Networks (GNNs). We denote a graph as G(V, E) with initial node features x v for v ∈ V and edge features e uv for (u, v) ∈ E. GNNs are built on graph structures to learn representation vectors h v for every node v ∈ V and a vector h G for the entire graph G. The k-th iteration of message passing, or the k-th layer of GNN forward computation is:where hv is the embedding of node v at the k-th layer, e uv is the feature vector of the edge between node u and v, N (v) is node v's neighbor set, and hare functions parameterized by neural networks. To simplify, we view the holistic message passing pipeline as an end-to-end function f θ (•) built on graph G:where X is the input node feature matrix. After K rounds of message passing we get the final-layer node matrix H (K) . To obtain the representation of the entire graph h G , the permutation-invariant READOUT(•) function pools node features from the final iteration K as:Additionally from the spectral convolution point of view, the k-th layer of GCN is:

