DIFFDOCK: DIFFUSION STEPS, TWISTS, AND TURNS FOR MOLECULAR DOCKING

Abstract

Predicting the binding structure of a small molecule ligand to a protein-a task known as molecular docking-is critical to drug design. Recent deep learning methods that treat docking as a regression problem have decreased runtime compared to traditional search-based methods but have yet to offer substantial improvements in accuracy. We instead frame molecular docking as a generative modeling problem and develop DIFFDOCK, a diffusion generative model over the non-Euclidean manifold of ligand poses. To do so, we map this manifold to the product space of the degrees of freedom (translational, rotational, and torsional) involved in docking and develop an efficient diffusion process on this space. Empirically, DIFFDOCK obtains a 38% top-1 success rate (RMSD<2 Å) on PDB-Bind, significantly outperforming the previous state-of-the-art of traditional docking (23%) and deep learning (20%) methods. Moreover, while previous methods are not able to dock on computationally folded structures (maximum accuracy 10.4%), DIFFDOCK maintains significantly higher precision (21.7%). Finally, DIFFDOCK has fast inference times and provides confidence estimates with high selective accuracy.

1. INTRODUCTION

The biological functions of proteins can be modulated by small molecule ligands (such as drugs) binding to them. Thus, a crucial task in computational drug design is molecular docking-predicting the position, orientation, and conformation of a ligand when bound to a target protein-from which the effect of the ligand (if any) might be inferred. Traditional approaches for docking [Trott & Olson, 2010; Halgren et al., 2004] rely on scoring-functions that estimate the correctness of a proposed structure or pose, and an optimization algorithm that searches for the global maximum of the scoring function. However, since the search space is vast and the landscape of the scoring functions rugged, these methods tend to be too slow and inaccurate, especially for high-throughput workflows. Recent works [Stärk et al., 2022; Lu et al., 2022] have developed deep learning models to predict the binding pose in one shot, treating docking as a regression problem. While these methods are much faster than traditional search-based methods, they have yet to demonstrate significant improvements in accuracy. We argue that this may be because the regression-based paradigm corresponds imperfectly with the objectives of molecular docking, which is reflected in the fact that standard accuracy metrics resemble the likelihood of the data under the predictive model rather than a regression loss. We thus frame molecular docking as a generative modeling problem-given a ligand and target protein structure, we learn a distribution over ligand poses. To this end, we develop DIFFDOCK, a diffusion generative model (DGM) over the space of ligand poses for molecular docking. We define a diffusion process over the degrees of freedom involved in docking: the position of the ligand relative to the protein (locating the binding pocket), its orientation in the pocket, and the torsion angles describing its conformation. DIFFDOCK samples poses by running the learned (reverse) diffusion process, which iteratively transforms an uninformed, noisy prior distribution over ligand poses into the learned model distribution (Figure 1 ). Intuitively, this process can be viewed as the progressive refinement of random poses via updates of their translations, rotations, and torsion angles. While DGMs have been applied to other problems in molecular machine learning [Xu et al., 2021; Jing et al., 2022; Hoogeboom et al., 2022] , existing approaches are ill-suited for molecular docking, where the space of ligand poses is an (m + 6)-dimensional submanifold M ⊂ R 3n , where n and m are, respectively, the number of atoms and torsion angles. To develop DIFFDOCK, we recognize that the docking degrees of freedom define M as the space of poses accessible via a set of allowed ligand pose transformations. We use this idea to map elements in M to the product space of the groups corresponding to those transformations, where a DGM can be developed and trained efficiently. As applications of docking models often require only a fixed number of predictions and a confidence score over these, we train a confidence model to provide confidence estimates for the poses sampled from the DGM and to pick out the most likely sample. This two-step process can be viewed as an intermediate approach between brute-force search and one-shot prediction: we retain the ability to consider and compare multiple poses without incurring the difficulties of high-dimensional search. Empirically, on the standard blind docking benchmark PDBBind, DIFFDOCK achieves 38% of top-1 predictions with ligand root mean square distance (RMSD) below 2 Å, nearly doubling the performance of the previous state-of-the-art deep learning model (20%). DIFFDOCK significantly outperforms even state-of-the-art search-based methods (23%), while still being 3 to 12 times faster on GPU. Moreover, it provides an accurate confidence score of its predictions, obtaining 83% RMSD<2 Å on its most confident third of the previously unseen complexes. We further evaluate the methods on structures generated by ESMFold [Lin et al., 2022] . Our results confirm previous analyses [Wong et al., 2022] that showed that existing methods are not capable of docking against these approximate apo-structures (RMSD<2 Å equal or below 10%). Instead, without further training, DIFFDOCK places 22% of its top-1 predictions within 2 Å opening the way for the revolution brought by accurate protein folding methods in the modeling of protein-ligand interactions. To summarize, the main contributions of this work are: 1. We frame the molecular docking task as a generative problem and highlight the issues with previous deep learning approaches. 2. We formulate a novel diffusion process over ligand poses corresponding to the degrees of freedom involved in molecular docking. 3. We achieve a new state-of-the-art 38% top-1 prediction with RMSD<2 Å on PDBBind blind docking benchmark, considerably surpassing the previous best search-based (23%) and deep learning methods (20%). 4. Using ESMFold to generate approximate protein apo-structures, we show that our method places its top-1 prediction with RMSD<2 Å on 28% of the complexes, nearly tripling the accuracy of the most accurate baseline.



* Equal contribution. Correspondence to {gcorso, hstark, bjing}@mit.edu.



Figure 1: Overview of DIFFDOCK. Left: The model takes as input the separate ligand and protein structures. Center: Randomly sampled initial poses are denoised via a reverse diffusion over translational, rotational, and torsional degrees of freedom. Right:. The sampled poses are ranked by the confidence model to produce a final prediction and confidence score.

