LEARNING HARMONIC MOLECULAR REPRESENTA-TIONS ON RIEMANNIAN MANIFOLD

Abstract

Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.

1. INTRODUCTION

Molecular representation learning is a fundamental step in AI-assisted drug discovery. Obtaining good molecular representations is crucial for the success of downstream applications including protein function prediction (Gligorijević et al., 2021) and molecular matching, e.g., protein-protein docking (Ganea et al., 2021) . In general, an ideal molecular representation should well integrate both geometric (e.g., 3D conformation) and chemical information (e.g., electrostatic potential). Additionally, such representation should capture features in various resolutions to accommodate different tasks, e.g., high-level holistic features for molecular property prediction, and fine-grained features for describing whether two proteins can bind together at certain interfaces. Recently, geometric deep learning (GDL) (Bronstein et al., 2017; 2021; Monti et al., 2017) has been widely used in learning molecular representations (Atz et al., 2021; Townshend et al., 2021) . GDL captures necessary information by performing neural message passing (NMP) on common structures such as 2D/3D molecular graph (Klicpera et al., 2020; Stokes et al., 2020 ), 3D voxel (Liu et al., 2021) ,and point cloud (Unke et al., 2021). Specifically, GDL encodes: a) geometric features by modeling atomic positions in the Euclidean space, and b) chemical features by feeding atomic information into the message passing networks. High-level features could then be obtained by aggregating these atom-level features, which has shown promising results empirically. However, we argue that current molecular representations via NMP in the Euclidean space is not necessarily the optimal solution, which suffers from several drawbacks. First, current GDL approaches need to employ equivariant networks (Thomas et al., 2018) to guarantee that the molecular representations transform accordingly upon rotation and translation (Fuchs et al., 2020) , which could undermine the network expressive power (Cohen et al., 2018; Li et al., 2021) . Therefore, developing a representation that could properly encode 3D molecular structure while bypassing the equivariance requirement is desirable. Second, current molecular representations in GDL are learned in a

funding

* This work was conducted during internship at ByteDance Research.

