LOOK IN THE MIRROR: MOLECULAR GRAPH CON-TRASTIVE LEARNING WITH LINE GRAPH Anonymous

Abstract

Trapped by the label scarcity in molecular property prediction and drug design, graph contrastive learning came forward. A general contrastive model consists of a view generator, view encoder, and contrastive loss, in which the view mainly controls the encoded information underlying input graphs. Leading contrastive learning works show two kinds of view generators, that is, random or learnable data corruption and domain knowledge incorporation. While effective, the two ways also lead to molecular semantics altering and limited generalization capability, respectively. Thus, a decent view that can fully retain molecular semantics and is free from profound domain knowledge is supposed to come forward. To this end, we relate molecular graph contrastive learning with the line graph and propose a novel method termed LGCL. Specifically, by contrasting the given graph with the corresponding line graph, the graph encoder can freely encode the molecular semantics without omission. While considering the information inconsistency and over-smoothing derived from the learning process because of the mismatched pace of message passing in two kinds of graphs, we present a new patch with edge attribute fusion and two local contrastive losses for performance fixing. Compared with state-of-the-art (SOTA) methods for view generation, superior performance on molecular property prediction suggests the effectiveness of line graphs severing as the contrasting views.

1. INTRODUCTION

A deep understanding of molecular properties plays a vital role in the chemical and pharmaceutical domains. In order to computationally discover novel materials and drugs, the molecules will be abstractly regarded as graphs, in which atoms are vertices and bonds are edges Gilmer et al. (2017); Goh et al. (2017); Chen et al. (2018a) . Thus, the marriage between molecular property prediction and graph learning captured a bunch of researchers and showed their happiness in several fields Yang et al. ( 2019 Analogously, everything comes with a price. Inspecting the generated views in previous molecular graph contrastive learning unveils two intrinsic limitations. First, data augmentation-based methods adopting random or learnable corruption (e.g., node/edge dropping and graph generation) would lead to inevitable variance in the crucial semantics and further misguide the contrastive learning 1



); Song et al. (2020); Chen et al. (2021); Wu et al. (2022a). However, this relationship faces the challenges of label scarcity, as deep learning methods are known to consume massive amounts of labeled data, and annotated data are often of limited size and hard to acquire when considering many specific domains. In addition, given the immense differentiation among chemical molecules, existing supervised models could be barely reused in unseen cases Hu et al. (2020); Rong et al. (2020). Therefore, there are increasing demands for molecular representation learning in an unsupervised or self-supervised manner. Plenty of works have attempted to learn molecule representations discarding the supervision of labels, like graph context prediction Liu et al. (2019), graph-level motif prediction Rong et al. (2020) and masked attribute prediction Hu et al. (2020). In light of the contrastive learning from computer vision, researchers go one step further to model molecules in a contrastive manner with data augmentations You et al. (2020); Suresh et al. (2021). Considering the inherent characteristics of chemical molecules, graph contrastive learning incorporating well-designed domain knowledge has also shown excellent capacity in molecular properties prediction Sun et al. (2021); Fang et al. (2022).

