G-CENSOR: GRAPH CONTRASTIVE LEARNING WITH TASK-ORIENTED COUNTERFACTUAL VIEWS

Abstract

Graph Contrastive learning (GCL) has achieved great success in learning representations from unlabeled graph-structure data. However, how to automatically obtain the optimal contrastive views w.r.t specific downstream tasks is little studied. Theoretically, a downstream task can be causally correlated to particular substructures in graphs. The existing GCL methods may fail to enhance model performance on a given task when the task-related semantics are incomplete/preserved in the positive/negative views. To address this problem, we propose G-CENSOR, i.e., Graph Contrastive lEarniNg with taSk-oriented cOunteRfactual views, a modelagnostic framework designed for node property prediction tasks. G-CENSOR can simultaneously generate the optimal task-oriented counterfactual positive/negative views for raw ego-graphs and train graph neural networks (GNNs) with a contrastive objective between the raw ego-graphs and their corresponding counterfactual views. Extensive experiments on eight real-world datasets demonstrate that G-CENSOR can consistently outperform existing state-of-the-art GCL methods to improve the task performance and generalizability of a series of typical GNNs. To the best of our knowledge, this is a pioneer investigation to explore task-oriented graph contrastive learning from a counterfactual perspective in node property prediction tasks. We will release the source code after the review process.

1. INTRODUCTION

Inspired by the convincing success of contrastive learning in the domain of computer vision (Chen et al., 2020; He et al., 2020) and natural language processing (Gao et al., 2021) , graph contrastive learning (GCL) has become an emerging field that extends the idea to graph data (You et al., 2020a; Hassani & Ahmadi, 2020; Zhu et al., 2021; Li et al., 2022) , leading to generalizable, transferable and robust representations from unlabeled graph data (You et al., 2021) . Nevertheless, the generation mechanism of contrastive views, which has been recognized as an essential component in GCL (Zhu et al., 2021; Yin et al., 2022; You et al., 2021) , is still facing the following challenges: (a) Independent of downstream tasks. Although GCL is originally proposed for self-supervised learning, how to obtain the optimal positive view when downstream tasks are available can be an important question Xie et al. (2022) . However, most prior works, whether based on graph diffusion (Hassani & Ahmadi, 2020 ), uniform sampling (Zhu et al., 2020) , or adaptive sampling (Zhu et al., 2021; You et al., 2021) , ignore the downstream tasks' information. As shown in Figure 1 , whether a generated view is a appropriate positive view depends critically on the downstream tasks Chen et al. ( 2020). (b) Fitting spurious correlations. To introduce task information, learnable data augmentation has been investigated to automatically obtain the positive views for downstream tasks (Yin et al., 2022) . While these techniques have achieved promising performance, they are prone to be plagued by spurious correlations between graph structures and downstream tasks like general supervised methods, thus hurting the generalizability of representation model. (c) Difficulty in negative views selection. Beside positive views, negative sampling is also a vital component in GCL. Contrastive learning can benefit from hard negative samples (Joshua et al., 2021) . Meanwhile, negative samples, actually similar to the raw instances, can lead to a performance drop (Chuang et al., 2020) . Therefore, it can be hard to select suitable negative samples. Some works (He et al., 2020) utilize a great number of negative samples to avoid this trade-off but may cause scalability problems. These challenges can become more non-trivial with graph data since graph data are far more complex due to the non-Euclidean property (Zhu et al., 2021) . Figure 1 : An illustration for task-oriented contrastive views. Task A is to predict whether a node is in a triangle and task B is to predict the color of a node. A task-oriented view is positive if and only if it contains the credible evidence for the task label, otherwise it should be negative. In this paper, we propose a novel model-agnostic framework for node property prediction tasks, namely G-CENSOR, i.e., Graph Contrastive lEarniNg with taSk-oriented cOunteRfactual views. G-CENSOR generates high-quality positive and negative views simultaneously from a counterfactual perspective. In other words, a task-oriented counterfactual question about the contrastive views could be asked: "Would a judgment on the task label of an ego-graph change if part of the structure of the ego-graph were erased?" The answer no should be assigned to a positive view while the answer to a negative view should be yes. Technically, G-CENSOR adopts the learnable view generation approach and leverages an original counterfactual optimization objective to decompose an ego-graph into the sub-structures causally correlated to the downstream tasks and the sub-structures spuriously correlated to the downstream tasks. This two parts can further be regarded as positive and negative views, respectively. Learning representation model with such contrastive views can enhance both the model's task performance and generalizability. Notably, G-CENSOR doesn't need to contrast in-batch negative samples, this characteristic can help G-CENSOR get rid of performance-memory trade-off inherent in most prior GCL methods. The core contribution of this paper can be three-fold: (a) To the best of our knowledge, this is a pioneer investigation to explore task-oriented graph contrastive learning from a counterfactual perspective in node property prediction tasks. (b) We develop a novel model-agnostic framework, G-CENSOR, an approach to automatically generate both task-oriented counterfactual positive and negative views to enable sufficient and efficient graph contrastive learning. (c) We conduct extensive experiments on eight real-world datasets to demonstrate the superiority of G-CENSOR over existing state-of-the-art GCL methods to improve the performance and generalizability of typical GNNs.

2.1. GRAPH CONTRASTIVE LEARNING

Inspired by the success of data augmentation and contrastive learning in texts and images to address the data noise and data scarcity issues, Many graph contrastive learning (GCL) frameworks have been proposed lately (Liu et al., 2022; Xie et al., 2022) . GCL works usually consist of a graph views generation component to construct positive and negative views, and a contrastive objective to discriminate positive pairs from negative pairs (Xie et al., 2022) . Most works generate positive views by uniform random transformation, e.g. node dropping, edge perturbation and subgraph sampling (Zhu et al., 2020; You et al., 2020b; Yu et al., 2020; Zhao et al., 2021; You et al., 2020a; Hassani & Ahmadi, 2020; Sun et al., 2020; Velickovic et al., 2019 ). Zhu et al. (2021) proposed the adaptive data augmentation strategies to reflect input graph's intrinsic patterns, i.e., assign larger drop probabilities to unimportant edges. Recently, several works have proposed trainable augmentation strategies (You et al., 2021; Li et al., 2022; Yin et al., 2022) to learn drop probability distribution over nodes or edges. However, few works have discussed on how to generate an optimal task-oriented positive and negative views for graph data, and no learnable augmentation strategy has been proposed for node property prediction task. Table 1 lists the comparison between G-CENSOR and the other state-of-the-art GCL models on 4 different properties.

