POCKET-SPECIFIC 3D MOLECULE GENERATION BY FRAGMENT-BASED AUTOREGRESSIVE DIFFUSION MODELS

Abstract

Autoregressive model is widely adopted to generate 3D molecules which can fit any protein binding pocket. Current autoregressive model suffers from two major drawbacks. First, it is hard to capture local geometric patterns as only one atom is generated at each step. Second, most of the autoregressive models generate atoms and chemical bonds in two separate processes, which causes a number of problems such as incorrect counts of rings, a bias distribution of bond lengths, and inaccurate 3D molecular structures. To tackle this problem, we designed a model, named FragDiff, to generate 3D molecules fragment-by-fragment for pockets. In each generation step, FragDiff places a molecular fragment around the pocket by using E(3)-equivariant diffusion generative models to simultaneously predict the atom types, atom coordinates and the chemical bonds of the fragment. Extensive experimental results confirm our assumption that unifying the atoms and bonds generations could significantly improve the quality of the sampled 3D molecules in terms of more accurate distributions of 2D subgraphs and 3D substructures.

1. INTRODUCTION

Drug design has been greatly improved with the assistance of AI (Stokes et al., 2020; Zhavoronkov et al., 2019) . Insilico Medicine recently announces the world's first drug designed by AI has entered Phase 1 clinical trial (Zhavoronkov et al., 2019) . AI-based drug design has experienced several important stages. The first generation of methods focus on generating molecule graphs by leveraging multiple graph representation techniques (Jin et al., 2018a; b) . Later, researchers realized that the biochemical functions of a small molecule is partially determined by its 3D structure so new models are proposed to directly sample molecular drugs in the 3D space (Hoogeboom et al., 2022; Wu et al., 2022) . Recently, an increasing amount of generative models have been developed to generate molecules which can bind to the target protein based on the 3D structures of the binding pockets. A straightforward approach is to encode the geometric features of amino acids on the protein pockets and then translate them to a molecule (Skalic et al., 2019; Xu et al., 2021a) . The central problem of this end-to-end approach is that it does not explicitly characterize the interactions of atoms between molecules and pockets. Although involving the pockets, the structure complex of the target protein and molecules are missing so it is hard to quantify whether these molecules can dock into the desired pocket. To solve this problem, new models have been proposed to capture the atom-level interactions between molecules and pockets by directly sampling 3D molecules inside the 3D pockets (Masuda et al., 2020; Luo et al., 2021a; Liu et al., 2022; Peng et al., 2022) . However, in comparison to the pocket-free generation, pocket-specific models are still at an early stage and suffers from quite a few problems. Most pocket-specific models rely on the autoregressive process to generate a molecule. The atoms are placed one by one in the pocket and the chemical bonds are predicted by a separate model. This procedure often leads to inaccurate bond predictions and unrealistic 3D structures. For instance, it needs six steps to generate a benzene ring, which is unnecessary and error-prone. A natural solution is to adopt fragment-based generation approach. However, generating fragments is hard because the model has to simultaneously capture the relationship of more atoms and bonds. The diffusion models have achieved the state-of-the-art performance in various tasks (Ho et al., 2020) including 3D molecule generation (Luo et al., 2022; Xu et al., 2021b; Hoogeboom et al., 2022; Wu et al., 2022) .

