GRAPHVF: CONTROLLABLE PROTEIN-SPECIFIC 3D MOLECULE GENERATION WITH VARIATIONAL FLOW

Abstract

Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent generative models leveraging geometrical constraints imposed by proteins and molecules have shown great potential in generating protein-specific 3D molecules. Nevertheless, these methods fail to generate 3D molecules with 2D skeletal curtailments, which encode pharmacophoric patterns essential to drug potency and synthesizability. To cope with this challenge, we propose GraphVF, which integrates geometrical and skeletal restraints into a variational flow framework, where the former is captured through a normalizing flow transformation and the latter is encoded by an amortized factorized Gaussian. We empirically verify that our method achieves state-of-the-art performance on protein-specific 3D molecule generation in terms of binding affinity and some other drug properties. In particular, it represents the first controllable geometryaware, protein-specific molecule generation method, which enables creating binding 3D molecules with specified chemical sub-structures or drug properties.

1. INTRODUCTION

The de novo design of synthetically feasible drug-like molecules that bind to specific protein pockets is a crucial yet very challenging task in drug discovery. To cope with such challenge, there has been a recent surge of interest in leveraging deep generative models to effectively searching the chemical space for molecules with desired properties. These machine learning models typically encode the chemical structures of molecules into a low-dimensional space, which then can be optimized and sampled to generate potential 2D or 3D molecule candidates (Jin et al., 2018; Shi et al., 2020; Zhu et al., 2022; Hoogeboom et al., 2022) . Along this research line, a more promising direction has also been explored recently: generating 3D molecules that bind to given proteins. Such binding 3D molecule generation is fundamentally important because binding in fact mainly facilitates the functionalities of drugs. Fortunately, leveraging autoregressive models to generate drug molecules (i.e., ligands) directly based on the 3D geometry of the binding pocket has shown promising potential (Luo et al., 2021; Peng et al., 2022; Liu et al., 2022) . These methods explicitly capture the fine-grained atomic interactions in the 3D space, and produce ligand poses that can directly fit into the given binding pocket. Nevertheless, two critical issues remains unsolved for these existing geometric approaches: 1) effective encoding and sufficient preservation of pharmacophoric structural patterns in the ligand candidates, and 2) controllable ligand generation that aims at specified drug properties or sub-structures. The former prevents generating ligands that seem geometrically plausible, yet structurally invalid or pharmacophorically impotent; the later dominates the synthesibility and the practical usefulness of the drugs. We further elaborate them next. In practice, it is extremely valuable to keep track of the pharmacophoric patterns in the existing ligands, which indeed determines a ligand's bio-chemical activities and binding affinity to a large extent (Wermuth et al., 1998 ). Consider, for example, the molecules of serotonin (a benign neurostransmiter) and N,N-Dimethyltryptamine (DMT, a famous hallucinogen). As can be seen in Figure 5a from Appendix E, serotonin and DMT share a large common bulk of their structures, which both possess an indole and an ethylamine group, but differ enormously in their neural activities. In fact, the extra Methyl groups in DMT's NHMe 2 are pharmacophoric, inducing an attractive charge interaction with Asp-231 (Gomez-Jeria & Robles-Navarro, 2015) . This pharmacophoric feature gives rise to DMT's binding affinity with the 5-HT 2A binding site and produces hallucination.  - ✓ - - ✓ DMCG VAE ✓ ✓ ✓ ✓ - - JT-VAE VAE ✓ ✓ - ✓ - - GraphAF Autoregressive Flow ✓ ✓ - - - - GraphBP Autoregressive Flow ✓ - ✓ - ✓ - Pocket2Mol Spatial Autoregression ✓ ✓ ✓ - ✓ - GraphVF Variational Flow ✓ ✓ ✓ ✓ ✓ ✓ Such observations suggest that effectively enforcing pharmacophoric patterns in ligands is critical for binding. Equally important, controlling molecular properties like solubility, polarizability and heat capacity are instrumental to drug quality. This is to make sure that the synthesized drug molecules have good exposure, e.g. absorption/distribution/metabolism/excretion (ADME) in vivo, and thus, sufficient efficacy in clinical trials (Egan, 2010). It is worth noting that, although recent diffusion models like EDM (Hoogeboom et al., 2022) have been popular for their capability to perform controlled generation on these properties, performing such control while being pertinent to a given pocket structure for binding remains under-explored by previous works. To address the aforementioned two issues, we propose GraphVF, a protein-aware molecule generation framework that integrates both geometrical and skeletal constraints, aiming at controlling over the structure and property of the generated ligands. To attain this goal, we leverage flow-based architecture that combines amortized variational inference (Zhang et al., 2018) and autoregressive normalizing-flow generation. In specific, global structure of the drug ligand is organized as a junction tree (Jin et al., 2018) , and fine-grained geometrical context of the protein receptor is encoded via a valence-aware E(3)-GNN. These two constraints are integrated into a variational flow architecture, where the former enforces the variational distribution globally, while the latter administers the flow transformations autoregressively. We show empirically that, GraphVF generates drug molecules with high binding affinity to the receptor proteins, with or without the aid of reference ligands, outperforming state-of-the-art methods in terms of binding affinity and some other drug properties. More importantly, GraphVF exposes a clean-cut interface for imposing customized constraints, which is extremely useful in practice for controlling the sub-structure and bio-chemical property of generated drug ligands. To specify what our proposed model can actually do, we compare GraphVF with several representative models for molecule generation in Table 1 . Our main contributions are summarized as follows. • We devise a novel variational flow-based framework to seamlessly integrate geometrical and skeletal restraints to improve protein-specific 3D molecule generation. • We show the first method that enables generating 3D molecules with specified chemical sub-structures or bio-chemical properties. • We empirically demonstrate our method's superior performance to state-of-the-art approaches on generating binding 3D molecules. 



Comparison among representative molecular generative methods.

Van Oord et al., 2016). The line of work is usually context-free, aiming to produce high-quality molecules from scratch, or to render reasonable 3D conformations of given molecules. For example, JT-VAE(Jin et al., 2018)  generates molecular graphs with the guidance of a tree-structured scaffold over chemical substructures.GraphAF (Shi et al., 2020)  uses a flow-based model to generate atoms and bonds in an autoregressive manner. DMCG(Zhu et al., 2022)  and EDM(Hoogeboom et al., 2022)  leverage equivariant diffusion or iterative sampling and de-noising to generate 3D conformations from 2D structures. Unlike these methods, our approach aims at generating molecules that bind to given 3D protein pockets.

