SELF-ATTENTIVE RATIONALIZATION FOR GRAPH CONTRASTIVE LEARNING

Abstract

Graph augmentation is the key component to reveal instance-discriminative features of a graph as its rationale in graph contrastive learning (GCL). And existing rationale-aware augmentation mechanisms in GCL frameworks roughly fall into two categories and suffer from inherent limitations: (1) non-heuristic methods with the guidance of domain knowledge to preserve salient features, which require expensive expertise and lack generality, or (2) heuristic augmentations with a cotrained auxiliary model to identify crucial substructures, which face not only the dilemma between system complexity and transformation diversity, but also the instability stemming from the co-training of two separated sub-models. Inspired by recent studies on transformers, we propose Self-attentive Rationale guided Graph Contrastive Learning (SR-GCL), which integrates rationale generator and encoder together, leverages the self-attention values in transformer module as a natural guidance to delineate semantically informative substructures from both node-and edge-wise perspectives, and contrasts on rationale-aware augmented pairs. On real-world biochemistry datasets, visualization results verify the effectiveness of self-attentive rationalization, and the performance on downstream tasks demonstrates the state-of-the-art performance of SR-GCL for graph model pre-training.

1. INTRODUCTION

Graph augmentation is a crucial enabler for graph contrastive learning (GCL) (You et al., 2020; Qiu et al., 2020; Zhu et al., 2020) . It pre-trains the model to yield instance-discriminative representations by contrasting augmented samples against each other, without hand-annotated labels. To achieve this goal, early studies (You et al., 2020; 2021; Qiu et al., 2020; Zhu et al., 2020) conduct random corruptions in topological structures (i.e., nodes and edges) or attributes to construct contrastive pairs. However, such random corruptions, especially on salient substructures, easily cause a semantic gap between two augmented views of the same anchor graph, misguiding the following contrastive optimization procedure (Wang et al., 2021; Li et al., 2022) . To mitigate this, there has been recent interest in rationale discovery (Chang et al., 2020; Suresh et al., 2021; Li et al., 2022) as graph augmentation. We systematize these studies as rationale-aware augmentations, where a rationale exhibits a graph's instance-discriminative information from the others. The dominant paradigm often consists of two subsequent modules: the rationale discovery function and the rationale encoder, which aim at creating the rationale-aware views and yielding their representations to contrast, respectively. To find rationales, early studies turn to domain knowledge to highlight the salient parts of graphs (Zhu et al., 2021; Liu et al., 2022) . For instance, Rong et al. (2020 ) leverage RDkit (Landrum, 2010) , an assistant software of chemistry, to capture crucial functional groups with high activity in molecule graphs. However, such expertise is expensive or even inaccessible in some scenarios (Tang et al., 2014) . Besides, bringing in too much prior knowledge might harm generalization (Wang et al., 2022) . To mitigate this problem, recent efforts (Suresh et al., 2021; Li et al., 2022) introduce an auxiliary model instead to automatically identify rationales, which is named the rationale generator and co-train with the rationale encoder. In this ad-hoc scheme, however, we reveal two inherent limitations: • Typically, the generator is tailor-made for one single transformation of graph data (Suresh et al., 2021; Li et al., 2022) , forcing the focus on either node-or edge-wise rationales (e.g., Figures 1(b) 1

