MOLECULAR GEOMETRY PRETRAINING WITH SE(3)-INVARIANT DENOISING DISTANCE MATCHING

Abstract

Molecular representation pretraining is critical in various applications for drug and material discovery due to the limited number of labeled molecules, and most existing work focuses on pretraining on 2D molecular graphs. However, the power of pretraining on 3D geometric structures has been less explored. This is owing to the difficulty of finding a sufficient proxy task that can empower the pretraining to effectively extract essential features from the geometric structures. Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose GeoSSL, a 3D coordinate denoising pretraining framework to model such an energy landscape. Further by leveraging an SE(3)-invariant score matching method, we propose GeoSSL-DDM in which the coordinate denoising proxy task is effectively boiled down to denoising the pairwise atomic distances in a molecule. Our comprehensive experiments confirm the effectiveness and robustness of our proposed method.

1. INTRODUCTION

Learning effective molecular representations is critical in a variety of tasks in drug and material discovery, such as molecular property prediction [14, 20, 21, 74] , de novo molecular design and optimization [7, 36, 37, 39, 53, 77] , and retrosynthesis and reaction planning [4, 22, 52, 64] . Recent work based on graph neural networks (GNNs) [20] has shown superior performance thanks to the simplicity and effectiveness of GNNs in modeling graph-structured data. However, the problem remains challenging due to the limited number of labeled molecules as it is in general expensive and time-consuming to label molecules, which usually requires expensive physics simulations or wet-lab experiments. As a result, recently, there has been growing interest in developing pretraining or self-supervised learning methods for learning molecular representations by leveraging the huge amount of unlabeled molecule data [28, 35, 63, 75] . These methods have shown superior performance on many tasks, especially when the number of labeled molecules is insufficient. However, one limitation of these approaches is that they represent molecules as topological graphs, and molecular representations are learned through pretraining 2D topological structures (i.e., based on the covalent bonds). But intrinsically, for molecules, a more natural representation is based on their 3D geometric structures, which largely determine the corresponding physical and chemical properties. Indeed, recent works [20, 38] have empirically verified the importance of applying 3D geometric information for molecular property prediction tasks. Therefore, a more promising direction is to pretrain molecular representations based on their 3D geometric structures, which is the main focus of this paper. The main challenge for molecule geometric pretraining arises from discovering an effective proxy task to empower the pretraining to extract essential features from the 3D geometric structures. Our proxy task is motivated by the following observations. Studies [48] have shown that molecules are not static but in a continuous motion in the 3D Euclidean space, forming a potential energy surface (PES). As shown in Figure 1 , it is desirable to study the molecule in the local minima of the PES, called conformer. However, such stable state conformer often comes with different noises for the following reasons. First, the statistical and systematic errors in conformation estimation are unavoidable [11] . Second, it has been well-acknowledged that a conformer can have vibrations around the local minima in PES. Such characteristics of the molecular geometry motivate us to attempt to denoise the molecular coordinates around the local minima, to mimic the computation errors and conformation vibration within the corresponding local region. The denoising goal is to learn molecular representations that are robust to such noises and effectively capture the energy surface around the local minima. Figure 1 : Illustration on coordinate geometry of molecules. The molecule is in a continuous motion, forming a potential energy surface (PES), where each 3D coordinate (x-axis) corresponds to an energy value (y-axis). The provided molecules, i.e., conformers, are in the local minima (g1). It often comes with noises around the minima (e.g., statistical and systematic errors or vibrations), which can be captured using the perturbed geometry (g2). To achieve the aforementioned goal, we first introduce a general geometric self-supervised learning framework called GeoSSL. Based on this, we further propose an SE(3)-invariant denoising distance matching pretraining algorithm, GeoSSL-DDM. In a nutshell, to capture the smooth energy surface around the local minima, we aim to maximize the mutual information (MI) between a given stable geometry and its perturbed version (i.e., g 1 and g 2 in Figure 1 ). In practice, it is difficult to directly maximize the mutual information between two random variables. Thus, we propose to maximize an equivalent lower bound of the above mutual information, which amounts to a pretraining framework on denoising a geometric structure, coined GeoSSL. Moreover, directly denoising such noisy coordinates remains challenging because one may need to effectively constrain the pairwise atomic distances while changing the atomic coordinates. To cope with this obstacle, we further leverage an SE(3)-invariant score matching method, GeoSSL-DDM, to successfully transform the coordinate denoising desire to the denoising of pairwise atomic distances, which then can be effectively computed. In other words, our pretraining proxy task, namely mutual information maximization, effectively boils down to achieving an intuitive learning objective: denoising a molecule's pairwise atomic distances. Using 22 downstream geometric molecular prediction tasks, we empirically verify that our method outperforms nine pretraining baselines. Our main contributions are summarized as follows. (1) We propose a novel geometric self-supervised learning framework, GeoSSL. To the best of our knowledge, it is the first pretraining framework focusing on the pure 3D molecular datafoot_0 . (2) To overcome the challenge of attaining the coordinate denoising objective in GeoSSL, we propose GeoSSL-DDM, an SE(3)-invariant score matching strategy to successfully transform such objective into the denoising of pairwise atomic distances. (3) We empirically demonstrate the effectiveness and robustness of GeoSSL-DDM on 22 downstream tasks.

2.1. EQUIVARIANT GEOMETRIC MOLECULE REPRESENTATION LEARNING

Geometric representation learning. Recently, 3D geometric representation learning has been widely explored in the machine learning community, including but not limited to 3D point clouds [8, 44, 55, 67] , N-body particle [45, 47] , and 3D molecular conformation [6, 31, 32, 41, 50, 51, 56] , amongst many others. The learned representation should satisfy the physical constraints, e.g., it should be equivariant to the rotation and transition in the 3D Euclidean space. Such constraints can be described using group symmetry as introduced below.

SE(3)-invariant energy.

Constrained by the physical nature of 3D geometric data, a key principle we need to follow is to learn an SE(3)-equivariant representation function. The SE(3) is the special Euclidean group consisting of rigid transformations in the 3D Cartesian space, where the transformations include all the combinations of translations and rotations. Namely, the learned representation should be equivariant to translations and rotations for molecule geometries. We also note that the representation function needlessly satisfies the reflection equivariance for certain tasks like molecular chirality [1] . For more rigorous discussion, please check [17, 19, 65] . In this work, we will design an SE(3)invariant energy (score) function in addition to the SE(3)-equivariant representation backbone model.



During the rebuttal of our submission, one of the reviewers pointed us to this parallel work[76], which is also under review. We provide a detailed comparison with this work in Appendix G.

