POCKET-SPECIFIC 3D MOLECULE GENERATION BY FRAGMENT-BASED AUTOREGRESSIVE DIFFUSION MODELS

Abstract

Autoregressive model is widely adopted to generate 3D molecules which can fit any protein binding pocket. Current autoregressive model suffers from two major drawbacks. First, it is hard to capture local geometric patterns as only one atom is generated at each step. Second, most of the autoregressive models generate atoms and chemical bonds in two separate processes, which causes a number of problems such as incorrect counts of rings, a bias distribution of bond lengths, and inaccurate 3D molecular structures. To tackle this problem, we designed a model, named FragDiff, to generate 3D molecules fragment-by-fragment for pockets. In each generation step, FragDiff places a molecular fragment around the pocket by using E(3)-equivariant diffusion generative models to simultaneously predict the atom types, atom coordinates and the chemical bonds of the fragment. Extensive experimental results confirm our assumption that unifying the atoms and bonds generations could significantly improve the quality of the sampled 3D molecules in terms of more accurate distributions of 2D subgraphs and 3D substructures.

1. INTRODUCTION

Drug design has been greatly improved with the assistance of AI (Stokes et al., 2020; Zhavoronkov et al., 2019) . Insilico Medicine recently announces the world's first drug designed by AI has entered Phase 1 clinical trial (Zhavoronkov et al., 2019) . AI-based drug design has experienced several important stages. The first generation of methods focus on generating molecule graphs by leveraging multiple graph representation techniques (Jin et al., 2018a; b) . Later, researchers realized that the biochemical functions of a small molecule is partially determined by its 3D structure so new models are proposed to directly sample molecular drugs in the 3D space (Hoogeboom et al., 2022; Wu et al., 2022) . Recently, an increasing amount of generative models have been developed to generate molecules which can bind to the target protein based on the 3D structures of the binding pockets. A straightforward approach is to encode the geometric features of amino acids on the protein pockets and then translate them to a molecule (Skalic et al., 2019; Xu et al., 2021a) . The central problem of this end-to-end approach is that it does not explicitly characterize the interactions of atoms between molecules and pockets. Although involving the pockets, the structure complex of the target protein and molecules are missing so it is hard to quantify whether these molecules can dock into the desired pocket. To solve this problem, new models have been proposed to capture the atom-level interactions between molecules and pockets by directly sampling 3D molecules inside the 3D pockets (Masuda et al., 2020; Luo et al., 2021a; Liu et al., 2022; Peng et al., 2022) . However, in comparison to the pocket-free generation, pocket-specific models are still at an early stage and suffers from quite a few problems. Most pocket-specific models rely on the autoregressive process to generate a molecule. The atoms are placed one by one in the pocket and the chemical bonds are predicted by a separate model. This procedure often leads to inaccurate bond predictions and unrealistic 3D structures. For instance, it needs six steps to generate a benzene ring, which is unnecessary and error-prone. A natural solution is to adopt fragment-based generation approach. However, generating fragments is hard because the model has to simultaneously capture the relationship of more atoms and bonds. The diffusion models have achieved the state-of-the-art performance in various tasks (Ho et al., 2020) including 3D molecule generation (Luo et al., 2022; Xu et al., 2021b; Hoogeboom et al., 2022; Wu et al., 2022) . Diffusion models learn the data distributions by randomly adding noise and recovering the denoised data points. Empirically, it is nontrivial to directly apply the diffusion process to the pocket-specific molecule generation. It is hard to keep the local geometric constraints of a sampled molecules (e.g., carbon atoms in a ring on the same plane) as the complex data is still limited. To address all these problems, we design a new paradigm, named FragDiff, to design pocket-specific 3D molecules by integrating both the autoregressive and diffusion generative processes. One key observation is that a molecule usually contains multiple 3D fragments with higher frequency to appear inside a particular 3D pocket compared to other fragments. Therefore, we adopt the autogressive process to generate molecules at the fragment level instead of the atom level. Fragment is generated based on the diffusion models instead of extracting from manually-curated databases. In this way, the autoregressive process only need to learn to place a relatively small amount of elements and the diffusion models only need to learn how to generate a relative small amount of atoms such that local geometric and chemical constraints are easily captured. It also helps the diffusion model to capture the atom interactions as the interactions within each fragment are much denser and stronger than outside.

2. RELATED WORK

Pocket-specific molecule generation Early pocket-specific molecule generation method is to encode the pockets as latent embeddings and translate them into new molecules (Skalic et al., 2019; Xu et al., 2021a) . Such methods usually represent the output molecules as 1D strings or 2D graphs, which cannot explicitly capture the interactions between atoms. Advanced approaches focus on simultaneously generating 2D graphs and 3D structures of the molecules given the pockets. Li et al. ( 2021) designs a ligand network for this task by using docking scores and Monte Carlo Tree search algorithm. The model infers the interaction between pockets and molecules using AutoDock Vina (Eberhardt et al., 2021) rather than from the observed data, and its performance strongly relies on the accuracy of AutoDock. Masuda et al. ( 2020 2022) designs a fragment-based generation model for pocket binding. However, they rely on prior domain knowledge and over-simplified assumptions. Their model can only select from 28 manually-curated fragments which are connected by single bonds. Generating fragments from a predefined fragment vocabulary significantly limits the generation abilities. Another limitation is that these methods do not consider the 3D structures of the generated fragments. These two drawbacks motivate us to develop new models which not only generates arbitrary fragments learned from data but also determines the 3D coordinates of the fragments. Diffusion model for small molecule Although there is no diffusion-based model for pocketspecific molecule generation, it has been already applied to multiple pocket-free molecule design tasks (Xu et al., 2021b; Wu et al., 2022; Hoogeboom et al., 2022; Jing et al., 2022) . For instance, GeoDiff applied a model trained based on the diffusion technique to predict the 3D conformations of molecules (Xu et al., 2021b ). Hoogeboom et al. (2022) utilizes equivariant diffusion model to generate small 3D molecules, and Wu et al. (2022) integrates physical prior bridges into the diffusion process for molecule generation. In comparison to pocket-free setting, adding the pocket informa-



) utilized 3D Convolutional Neural Network (CNN) to capture the spatial information and leveraged CVAE to sample 3D molecules. The major drawback is the weak expressivenees and the generated molecules are found to possess poor chemical and drug-like properties (Peng et al., 2022). Autoregressive models have been proposed to solve this problem and achieve SOTA performance. Luo et al. (2021a) first propose to learn the probability densities of different atoms in the 3D space inside the pockets and place atoms based on the learned distributions. Liu et al. (2022) construct local frame to place atoms around the pockets. Peng et al. (2022) further utilized E(3)-equivariant GNN and more efficient sampling scheme to predict the atoms and chemical bonds. However, these methods all adopt the atom-by-atom generation strategy, which may lead to unrealistic subgraphs and inaccurate local 3D structures. Fragment-based molecule generation Molecules usually contain multiple frequent functional groups. There are quite a few works using fragments for pocket-free molecule generation. Jin et al. (2020) design a hierarchical generation models which first generated structural motifs and then decode to atom sets. Podda et al. (2020) utilize GRU to generate SMILES of fragments, and Xie et al. (2021) employ MCMC algorithm to iteratively edit these fragments. Recently, Powers et al. (

