GRAPHVF: CONTROLLABLE PROTEIN-SPECIFIC 3D MOLECULE GENERATION WITH VARIATIONAL FLOW

Abstract

Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent generative models leveraging geometrical constraints imposed by proteins and molecules have shown great potential in generating protein-specific 3D molecules. Nevertheless, these methods fail to generate 3D molecules with 2D skeletal curtailments, which encode pharmacophoric patterns essential to drug potency and synthesizability. To cope with this challenge, we propose GraphVF, which integrates geometrical and skeletal restraints into a variational flow framework, where the former is captured through a normalizing flow transformation and the latter is encoded by an amortized factorized Gaussian. We empirically verify that our method achieves state-of-the-art performance on protein-specific 3D molecule generation in terms of binding affinity and some other drug properties. In particular, it represents the first controllable geometryaware, protein-specific molecule generation method, which enables creating binding 3D molecules with specified chemical sub-structures or drug properties.

1. INTRODUCTION

The de novo design of synthetically feasible drug-like molecules that bind to specific protein pockets is a crucial yet very challenging task in drug discovery. To cope with such challenge, there has been a recent surge of interest in leveraging deep generative models to effectively searching the chemical space for molecules with desired properties. These machine learning models typically encode the chemical structures of molecules into a low-dimensional space, which then can be optimized and sampled to generate potential 2D or 3D molecule candidates (Jin et al., 2018; Shi et al., 2020; Zhu et al., 2022; Hoogeboom et al., 2022) . Along this research line, a more promising direction has also been explored recently: generating 3D molecules that bind to given proteins. Such binding 3D molecule generation is fundamentally important because binding in fact mainly facilitates the functionalities of drugs. Fortunately, leveraging autoregressive models to generate drug molecules (i.e., ligands) directly based on the 3D geometry of the binding pocket has shown promising potential (Luo et al., 2021; Peng et al., 2022; Liu et al., 2022) . These methods explicitly capture the fine-grained atomic interactions in the 3D space, and produce ligand poses that can directly fit into the given binding pocket. Nevertheless, two critical issues remains unsolved for these existing geometric approaches: 1) effective encoding and sufficient preservation of pharmacophoric structural patterns in the ligand candidates, and 2) controllable ligand generation that aims at specified drug properties or sub-structures. The former prevents generating ligands that seem geometrically plausible, yet structurally invalid or pharmacophorically impotent; the later dominates the synthesibility and the practical usefulness of the drugs. We further elaborate them next. In practice, it is extremely valuable to keep track of the pharmacophoric patterns in the existing ligands, which indeed determines a ligand's bio-chemical activities and binding affinity to a large extent (Wermuth et al., 1998 ). Consider, for example, the molecules of serotonin (a benign neurostransmiter) and N,N-Dimethyltryptamine (DMT, a famous hallucinogen). As can be seen in Figure 5a from Appendix E, serotonin and DMT share a large common bulk of their structures, which both possess an indole and an ethylamine group, but differ enormously in their neural activities. In fact, the extra Methyl groups in DMT's NHMe 2 are pharmacophoric, inducing an attractive charge interaction with Asp-231 (Gomez-Jeria & Robles-Navarro, 2015) . This pharmacophoric feature gives rise to DMT's binding affinity with the 5-HT 2A binding site and produces hallucination.

