CONTRASTIVE LEARNING OF MOLECULAR REPRESENTATION WITH FRAGMENTED VIEWS

Abstract

Molecular representation learning is a fundamental task for AI-based drug design and discovery. Contrastive learning is an attractive framework for this task, as also evidenced in various domains of representation learning, e.g., image, language, and speech. However, molecule-specific ways of constructing good positive or negative views in contrastive training under consideration of their chemical semantics have been relatively under-explored. In this paper, we consider a molecule as a bag of meaningful fragments, e.g., chemically informative substructures, by disconnecting a non-ring single bond as the semantic-preserving transformation. Then, we suggest to construct a complete (or incomplete) bag of fragments as the positive (or negative) views of a molecule: each fragment loses chemical substructures from the original molecule, while the union of the fragments does not. Namely, this provides easy positive and hard negative views simultaneously for contrastive representation learning so that it can selectively learn useful features and ignore nuisance features. Furthermore, we additionally suggest to optimize the torsional angle reconstruction loss around the fragmented bond to incorporate with 3D geometric structure in the pretraining dataset. Our experiments demonstrate that our scheme outperforms prior state-of-the-art molecular representation learning methods across various downstream molecule property prediction tasks.

1. INTRODUCTION

Obtaining discriminative representations of molecules is a long-standing research problem in chemistry (Morgan, 1965) . Such a task is critical for many applications such as drug discovery (Capecchi et al., 2020) and material design (Gómez-Bombarelli et al., 2018) , since it is a fundamental building block for various downstream tasks, e.g., molecule property prediction (Duvenaud et al., 2015) and molecule generation (Mahmood et al., 2021) . Over the past decades, researchers have focused on handcrafting the molecular fingerprint representation which encodes the presence or absence of chemically meaningful substructures, e.g., functional groups, in a molecule (Rogers & Hahn, 2010) . Recently, graph neural networks (GNNs) (Kipf & Welling, 2016) have gained much attention as a framework to learn the molecular representation due to its remarkable performance for learning to predict chemical properties (Wu et al., 2018) . However, they suffer from overfitting without much labeled training data (Rong et al., 2020b) . To resolve this issue, researchers have investigated selfsupervised learning to generate supervisory signals from a large amount of unlabeled molecules. A notable approach on this line of work is contrastive learning, which learns a discriminative representation by maximizing the agreement of representations of "similar" positive views while minimizing the agreement of "dissimilar" negative views (Chen et al., 2020a) ; it has widely demonstrated its effectiveness for representation learning not only for molecules (Wang et al., 2021; 2022) , but also for other domains, e.g., image (Chen et al., 2020a; He et al., 2019 ), video (Pan et al., 2021 ), language (Wu et al., 2020 ), and speech (Chung et al., 2021) . Here, the common challenge for learning good representation is how to construct effective positive and negative views in a self-supervised manner. For molecule contrastive representation learning, most prior works have utilized graph-augmentation techniques, e.g., edge/node drop, to produce positive views (You et al., 2020; 2021) . However, such augmentations often fail to generate proper positive views of molecule graph, losing important chemical semantics from the anchor molecule, e.g., randomly inserting an edge of a graph may generate a non-realistic molecule (Fang et al., 2021b) . Thus, semantic-preserving transformation Contribution. In this paper, we propose Fragment-based molecule Contrastive Learning (FragCL), a novel contrastive learning method using fragments to simultaneously generate easy positive and hard negative views of a molecule graph. FragCL consists of the following key ingredients with overall illustration provided in Figure 1 . • Fragment-based positive view construction: We construct a new positive view of molecules by decomposing it into a bag of meaningful fragments. We propose disconnecting a non-ring single bond of a molecule in half as the semantic-preserving transformation, since such a transformation preserves most of the chemically informative substructure, e.g., the number of heteroatoms and the existence of chemically informative substructures. Then, we suggest to regard a complete bag of resulting fragments as an easy positive view of a molecule. • Fragment-based negative view construction: For negative views of a molecule, we consider (a) incomplete bag of fragments of its own and (b) complete bag of fragments of other molecules (in a mini-batch). Here, (a) is of strikingly different choice from prior molecule contrastive learning methods (You et al., 2020; Wang et al., 2021) ; existing works use the subgraphs of a molecule only as positive views, while we use an incomplete bag of fragments, which is a special kind of subgraph, as a negative view. Our intuition is that (a) becomes a hard-to-discriminate negative view as we fragment a molecule in half and incomplete bag of fragments, i.e., roughly half of important substructures of the anchor molecule are lost. • Torsional angle reconstruction from fragments: We additionally propose a pretext task to incorporate 3D geometric context with fragments. We note that torsional angle, i.e., the angle between planes through two sets of three atoms having two atoms in common, defines several 3D contextural properties, e.g., the energy surface around the fragmented bond (Smith, 2008) . Thus, the 2D graph encoder is able to learn meaningful 3D contextual information by reconstructing it from the fragments during training.



Figure 1: Illustration of FragCL: contrastive learning of molecular representation with fragmented views. (a) Fragment-based view construction: We construct a bag of fragments from a molecule via fragmentation. A complete (or incomplete) bag of fragments is regarded as a positive (or negative) view of the original molecule. (b) Additionally, 3D contextual information can be learned by reconstructing the torsional angle around the fragmented bond.

