EQUIVARIANT SHAPE-CONDITIONED GENERATION OF 3D MOLECULES FOR LIGAND-BASED DRUG DESIGN

Abstract

Shape-based virtual screening is widely used in ligand-based drug design to search chemical libraries for molecules with similar 3D shapes yet novel 2D graph structures compared to known ligands. 3D deep generative models can potentially automate this exploration of shape-conditioned 3D chemical space; however, no existing models can reliably generate geometrically realistic drug-like molecules in conformations with a specific shape. We introduce a new multimodal 3D generative model that enables shape-conditioned 3D molecular design by equivariantly encoding molecular shape and variationally encoding chemical identity. We ensure local geometric and chemical validity of generated molecules by using autoregressive fragment-based generation with heuristic bonding geometries, allowing the model to prioritize the scoring of rotatable bonds to best align the growing conformation to the target shape. We evaluate our 3D generative model in tasks relevant to drug design including shape-conditioned generation of chemically diverse molecular structures and shape-constrained molecular property optimization, demonstrating its utility over virtual screening of enumerated libraries.

1. INTRODUCTION

Generative models for de novo molecular generation have revolutionized computer-aided drug design (CADD) by enabling efficient exploration of chemical space, goal-directed molecular optimization (MO), and automated creation of virtual chemical libraries (Segler et al., 2018; Meyers et al., 2021; Huang et al., 2021; Wang et al., 2022; Du et al., 2022; Bilodeau et al., 2022) . Recently, several 3D generative models have been proposed to directly generate low-energy or (bio)active molecular conformations using 3D convolutional networks (CNNs) (Ragoza et al., 2020) , reinforcement learning (RL) (Simm et al., 2020a; b), autoregressive generators (Gebauer et al., 2022; Luo & Ji, 2022) , or diffusion models (Hoogeboom et al., 2022) . These methods have especially enjoyed accelerated development for structure-based drug design (SBDD), where models are trained to generate druglike molecules in favorable binding poses inside an explicit protein pocket (Drotár et al., 2021; Luo et al., 2022; Liu et al., 2022; Ragoza et al., 2022) . However, SBDD requires atomically-resolved structures of a protein target, assumes knowledge of binding sites, and often ignores dynamic pocket flexibility, rendering these methods less effective in many CADD settings. Ligand-based drug design (LBDD) does not assume knowledge of protein structure. Instead, molecules are compared against previously identified "actives" on the basis of 3D pharmacophore or 3D shape similarity under the principle that molecules with similar structures should share similar activity (Vázquez et al., 2020; Cleves & Jain, 2020) . In particular, ROCS (Rapid Overlay of Chemical Structures) is commonly used as a shape-based virtual screening tool to identify molecules with similar shapes to a reference inhibitor and has shown promising results for scaffold-hopping tasks (Rush et al., 2005; Hawkins et al., 2007; Nicholls et al., 2010) . However, virtual screening relies on enumeration of chemical libraries, fundamentally restricting its ability to probe new chemical space. Here, we consider the novel task of generating chemically diverse 3D molecular structures conditioned on a molecular shape, thereby facilitating the shape-conditioned exploration of chemical space without the limitations of virtual screening (Fig. 1 ). Importantly, shape-conditioned 3D molecular generation presents unique challenges not encountered in typical 2D generative models:

