GRAPH CONTRASTIVE LEARNING UNDER HETEROPHILY: UTILIZING GRAPH FILTERS TO GENERATE GRAPH VIEWS Anonymous

Abstract

Graph Neural Networks have achieved tremendous success in (semi-)supervised tasks for which task-specific node labels are available. However, obtaining labels is expensive in many domains, specially as the graphs grow larger in size. Hence, there has been a growing interest in the application of self-supervised techniques, in particular contrastive learning (CL), to graph data. In general, CL methods work by maximizing the agreement between encoded augmentations of the same example, and minimizing agreement between encoded augmentations of different examples. However, we show that existing graph CL methods perform very poorly on graphs with heterophily, in which connected nodes tend to belong to different classes. First, we show that this is attributed to the ineffectiveness of existing graph augmentation methods. Then, we leverage graph filters to directly generate augmented graph views for graph CL under heterophily. In particular, instead of explicitly augmenting the graph topology and encoding the augmentations, we use a high-pass filter in the encoder to generate node representations only based on high-frequency graph signals. Then, we contrast the high-pass filtered representations with their low-pass counterparts produced by the same encoder, to generate representations. Our experimental results confirm that our proposed method, HLCL, outperforms state-of-the-art CL methods on benchmark graphs with heterophily, by up to 10%.

1. INTRODUCTION

Graph neural networks (GNNs) are powerful tools for learning graph-structured data in various domains, including social networks, biological compound structures, and citation networks (Kipf & Welling, 2016; Hamilton et al., 2017; Veličković et al., 2017) . In general, GNNs leverage the graph's adjacency matrix to update the node representations by aggregating information from their neighbors. This can be seen as a lowpass filter that smooths the graph signals and produces similar node representations (Nt & Maehara, 2019) . GNNs have achieved great success in supervised and semi-supervised learning, where task-specific labels are available. However, obtaining high-quality labels is very expensive in many domains, specially as graphs grow larger in size. This has motivated a recent body of work on self-supervised learning on graphs that learn the representations in an unsupervised manner (Velickovic et al., 2019; Peng et al., 2020; Qiu et al., 2020; Hassani & Khasahmadi, 2020; Zhu et al., 2020b) . Among self-supervised methods, Contrastive Learning (CL) has shown a great success by achieving comparable performance with its supervised counterparts (Chen et al., 2020) . Contrastive learning obtains representations by maximizing the mutual information between different augmented views of the same example, and minimizing agreement between differently augmented views of different examples. Despite being successful on graphs with homophily, where neighboring nodes tend to share the same label, existing graph CL methods cannot learn high-quality representations for graphs with heterophily, where connected nodes often belong to different classes Zhu et al. (2020b) . State-of-the-art graph CL methods work by contrasting the encoded node representations in two explicitly augmented graph views, generated by altering the graph topology or node features (Zhu et al., 2020c; 2021b;  

