CONFORMATION-GUIDED MOLECULAR REPRESENTA-TION WITH HAMILTONIAN NEURAL NETWORKS

Abstract

Well-designed molecular representations (fingerprints) are vital to combine medical chemistry and deep learning. Whereas incorporating 3D geometry of molecules (i.e. conformations) in their representations seems beneficial, current 3D algorithms are still in infancy. In this paper, we propose a novel molecular representation algorithm which preserves 3D conformations of molecules with a Molecular Hamiltonian Network (HamNet). In HamNet, implicit positions and momentums of atoms in a molecule interact in the Hamiltonian Engine following the discretized Hamiltonian equations. These implicit coordinations are supervised with real conformations with translation-& rotation-invariant losses, and further used as inputs to the Fingerprint Generator, a message-passing neural network. Experiments show that the Hamiltonian Engine can well preserve molecular conformations, and that the fingerprints generated by HamNet achieve stateof-the-art performances on MoleculeNet, a standard molecular machine learning benchmark.

1. INTRODUCTION

The past several years have seen a prevalence of the intersection between medical chemistry and deep learning. Remarkable progress has been made in various applications on small molecules, ranging from generation (Jin et al., 2018; You et al., 2018) and property prediction (Gilmer et al., 2017; Cho & Choi, 2019; Klicpera et al., 2020) to protein-ligand interaction analysis (Lim et al., 2019; Wang et al., 2020 ), yet all these tasks rely on well-designed numerical representations, or fingerprints, of molecules. These fingerprints encode molecular structures and serve as the indicators in downstream tasks. Early work of molecular fingerprints (Morgan, 1965; Rogers & Hahn, 2010) started from encoding the two-dimensional (2D) structures of molecules, i.e. the chemical bonds between atoms, often stored as atom-bond graphs. More recently, a trend of incorporating molecular geometry into the representations arose (Axen et al., 2017; Cho & Choi, 2019) . Molecular geometry refers to the conformation (the three-dimensional (3D) coordinations of atoms) of a molecule, which contains widely interested chemical information such as bond lengths and angles, and thus stands vital for determining physical, chemical, and biomedical properties of the molecule. Whereas incorporating 3D geometry of molecules seems indeed beneficial, 3D fingerprints, especially in combination with deep learning, are still in infancy. The use of 3D fingerprints is limited by pragmatic considerations including i) calculation costs, ii) translational & rotational invariances, and iii) the availability of conformations, especially considering the generated ligand candidates in drug discovery tasks. Furthermore, compared with current 3D algorithms, mature 2D fingerprints (Rogers & Hahn, 2010; Gilmer et al., 2017; Xiong et al., 2020) are generally more popular with equivalent or even better performances in practice. For example, as a 2D approach, Attentive Fingerprints (Attentive FP) (Xiong et al., 2020) have become the de facto state-of-the-art approach. To push the boundaries of leveraging 3D geometries in molecular fingerprints, we propose HamNet (Molecular Hamiltonian Networks). HamNet simulates the process of molecular dynamics (MD)

