SPECTRAL AUGMENTATION FOR SELF-SUPERVISED LEARNING ON GRAPHS

Abstract

Graph contrastive learning (GCL), as an emerging self-supervised learning technique on graphs, aims to learn representations via instance discrimination. Its performance heavily relies on graph augmentation to reflect invariant patterns that are robust to small perturbations; yet it still remains unclear about what graph invariance GCL should capture. Recent studies mainly perform topology augmentations in a uniformly random manner in the spatial domain, ignoring its influence on the intrinsic structural properties embedded in the spectral domain. In this work, we aim to find a principled way for topology augmentations by exploring the invariance of graphs from the spectral perspective. We develop spectral augmentation which guides topology augmentations by maximizing the spectral change. Extensive experiments on both graph and node classification tasks demonstrate the effectiveness of our method in unsupervised learning, as well as the generalization capability in transfer learning and the robustness property under adversarial attacks. Our study sheds light on a general principle for graph topology augmentation.

1. INTRODUCTION

Graph neural networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Xu et al., 2019) have advanced graph representation learning in a (semi-)supervised manner, yet it requires supervised labels and may fail to generalize (Rong et al., 2020) . To obtain more generalizable and transferable representations, the self-supervised learning (SSL) paradigm emerges which enables GNNs to learn from pretext tasks constructed on unlabeled graph data (Hu et al., 2020c; b; You et al., 2020b; Jin et al., 2020a) . As a state-of-the-art SSL technique, graph contrastive learning (GCL) has attracted the most attention due to its remarkable empirical performance (Velickovic et al., 2019; Zhu et al., 2020; Hassani & Khasahmadi, 2020; You et al., 2021; Suresh et al., 2021; Thakoor et al., 2021) . A typical GCL method works by creating augmented views of the input graph and learning representations by contrasting related graph objects against unrelated ones. Different contrastive objects are studied on graphs, such as node-node (Zhu et al., 2020; 2021; Peng et al., 2020) , node-(sub)graph (Veličković et al., 2019; Hassani & Khasahmadi, 2020; Sun et al., 2019) and graphgraph (Bielak et al., 2021; Thakoor et al., 2021; Suresh et al., 2021) contrastive pairs. The goal of GCL is to capture graph invariance by maximizing the congruence between node or graph representations in augmented views. This makes graph augmentation one of the most critical designs in GCL, as it determines the effectiveness of the contrastive objective. However, despite that various GCL methods have been proposed, it remains a mystery about what graph invariance GCL should capture. Unlike images, which can be augmented to naturally highlight the main subject from the background, it is less obvious to design the most effective graph augmentation due to the complicated topology structure of diverse nature in different graphs (e.g., citation networks (Sen et al., 2008 ), social networks (Morris et al., 2020) , chemical and biomedical molecules (Li et al., 2021; Hu et al., 2020b) ), as discussed in the survey (Ding et al., 2022) . We argue that an ideal GCL encoder should preserve structural invariance, and an effective augmentation focuses on perturbing edges leading to large changes in structural properties; and by maximizing the congruence across the resulting views, information about sensitive or friable structures will be minimized in the learned representations. Most existing works perform topology augmentations in a uniformly random manner (Zhu et al., 2020; Thakoor et al., 2021) , which achieves a certain level of empirical success, but is far from optimal: recent studies show that perturbations on different edges post unequal influence on the structural properties (Entezari et al., 2020; Chang et al., 2021a) while the uniformly random edge perturbation ignores such differences. Such a discrepancy suggests an opportunity to improve the common uniform augmentation by considering structural properties to better capture structural invariance. Since graph spectrum summarizes many important structural properties (Chung & Graham, 1997) , we propose to preserve spectral invariance as a proxy of structural invariance, which refers to the invariance of the encoder's output to perturbations on edges that cause large changes on the graph spectrum. To realize the spectral invariance, we focus on designing a principled augmentation method from the perspective of graph spectrum, termed SPectral AugmentatioN (SPAN). Specifically, we search for topology augmentations that achieve the largest disturbance on graph spectrum. By identifying sensitive edges whose perturbation leads to a large spectral difference, SPAN allows the GNN encoder to focus on robust spectrtal components (which can be hardly affected by small edge perturbations) and to reduce its dependency on instable ones (which can be easily affected by the perturbation). Therefore, the learned encoder captures the minimally information about the graph (Tishby et al., 2000; Tian et al., 2020) for downstream tasks. We provide an instantiation of GCL on top of the proposed augmentation method SPAN, which can also be easily paired with different GCL paradigms as it only requires a one-time pre-computation of the edge perturbation probability. The effectiveness of SPAN is extensively evaluated on various benchmark datasets, which cover commonly seen graph learning tasks such as node classification, graph classification and regression. The applicability of SPAN is tested under various settings including unsupervised learning, transfer learning and adversarial learning setting. In general, SPAN achieves remarkable performance gains compared to the state-of-the-art baselines. Our study can potentially open up new ways for topology augmentation from the perspective of graph spectrum.

2. RELATED WORKS

Graph Contrastive Learning (GCL) leverages the InfoMax principle (Hjelm et al., 2018) to maximize the correspondence between related objects on the graph such that invariant property across objects is captured. Depending on how the positive objects are defined, one line of work treats different parts of a graph as positive pairs, while constructing negative examples from a corrupted graph (Hu et al., 2020b; Jiao et al., 2020; Veličković et al., 2019; Peng et al., 2020; Sun et al., 2019) . In such works, contrastive pairs are defined as nodes (Veličković et al., 2019) or substructures (Sun et al., 2019) v.s. the entire graph, and the input graph v.s. reconstructed graph (Peng et al., 2020) . The other line of works exploit graph augmentation to generate multiple views, which enable more flexible contrastive pairs (Thakoor et al., 2021; Bielak et al., 2021; Suresh et al., 2021; You et al., 2021; Feng et al., 2022; Ding et al., 2022) . By generating augmented views, the GNN model is encouraged to encode crucial graph information invariant to different views. In this work, we focus on topology augmentation. As a parallel effort in self-supervised learning, augmentation-free techniques (Lee et al., 2022; Wang et al., 2022) avoid augmentation but require special treatments (e.g., kNN search or clustering) to obtain positive and negative pairs, which is out scope of this work. Graph Topology Augmentation. The most widely adopted topology augmentation is the edge perturbation following uniform distribution (Zhu et al., 2020; Thakoor et al., 2021; Bielak et al., 2021; You et al., 2020a) . The underlying assumption is that each edge is equally important to the property of the input graph. However, a recent study shows that edge perturbations do not post equal influence to the graph spectrum (Chang et al., 2021a) which summarizes a graph's structural property. To better preserve graph property that has been ignored by uniform perturbations, domain knowledge from network science is leveraged by considering the importance of edges measured via node centrality (Zhu et al., 2021) , the global diffusion matrix (Hassani & Khasahmadi, 2020) , and the random-walk based context graph (Qiu et al., 2020) . While these works consider ad-hoc heuristics, our method targets at the graph spectrum, which comprehensively summarizes global graph properties and plays a crucial role in the spectral filter of GNNs. To capture minimally sufficient information from the graph and remove redundancy that could compromise downstream performance, adversarial training strategy is paired with GCL for graph augmentation Suresh et al. (2021); You et al. (2021); Feng et al. (2022) , following the information bottleneck (IB) (Tishby et al., 2000) and InfoMin principle (Tian et al., 2020) . While the adversarial augmentation method requires frequent back-propagation during training, our method realizes a similar principle with a simpler but effective augmentation by maximizing the spectral difference of views with only one-time pre-computation.

