MOLECULE GENERATION FOR TARGET PROTEIN BINDING WITH STRUCTURAL MOTIFS

Abstract

Designing ligand molecules that bind to specific protein binding sites is a fundamental problem in structure-based drug design. Although deep generative models and geometric deep learning have made great progress in drug design, existing works either sample in the 2D graph space or fail to generate valid molecules with realistic substructures. To tackle these problems, we propose a Fragmentbased LigAnd Generation framework (FLAG), to generate 3D molecules with valid and realistic substructures fragment-by-fragment. In FLAG, a motif vocabulary is constructed by extracting common molecular fragments (i.e., motif) in the dataset. At each generation step, a 3D graph neural network is first employed to encode the intermediate context information. Then, our model selects the focal motif, predicts the next motif type, and attaches the new motif. The bond lengths/angles can be quickly and accurately determined by cheminformatics tools. Finally, the molecular geometry is further adjusted according to the predicted rotation angle and the structure refinement. Our model not only achieves competitive performances on conventional metrics such as binding affinity, QED, and SA, but also outperforms baselines by a large margin in generating molecules with realistic substructures.

1. INTRODUCTION

Recent years have witnessed the great success of deep learning in drug design. Among the progress, deep generative models that aim to generate molecules with desirable physicochemical and pharmacological properties are of particular importance. These models range from string-based (Gómez-Bombarelli et al., 2018) and graph-based methods (Jin et al., 2018; Xie et al., 2021) to recent 3D geometry-based methods (Gebauer et al., 2019; Luo & Ji, 2021) . Molecule drugs can only affect certain biological functions and pathways by binding to the target proteins. However, the complexity of the context information, geometric constraints, and moleculeprotein interactions bring great challenges. Therefore, few deep learning models have been developed to generate molecules that bind to specific protein binding sites (a.k.a. structure-based drug design). Early attempts modify the pocket-free models by incorporating scoring functions like docking scores between generated molecules and pockets to guide the ligand generation (Li et al., 2021) . Another line of works convert the 3D pocket structures to molecular string or graph representations for conditional generation (Skalic et al., 2019; Xu et al., 2021a) . They fail to model how molecules interact with their target proteins explicitly in 3D space. Recently, a series of 3D generative models are proposed to generate 3D molecules that bind to given protein pockets (Luo & Ji, 2021; Liu et al., 2022; Peng et al., 2022) . They use 3D graph neural networks for context encoding and achieve equivariance. However, most of these works do not consider chemical priors and may generate

availability

Our code is publicly available at https://github.com/zaixizhang

