EQUIVARIANT 3D-CONDITIONAL DIFFUSION MODELS FOR MOLECULAR LINKER DESIGN Anonymous

Abstract

Fragment-based drug discovery has been an effective paradigm in early-stage drug development. An open challenge in this area is designing linkers between disconnected molecular fragments of interest to obtain chemically-relevant candidate drug molecules. In this work, we propose DiffLinker, an E(3)-equivariant 3D-conditional diffusion model for molecular linker design. Given a set of disconnected fragments, our model places missing atoms in between and designs a molecule incorporating all the initial fragments. Unlike previous approaches that are only able to connect pairs of molecular fragments, our method can link an arbitrary number of fragments. Additionally, the model automatically determines the number of atoms in the linker and its attachment points to the input fragments. We demonstrate that DiffLinker outperforms other methods on the standard datasets generating more diverse and synthetically-accessible molecules. Besides, we experimentally test our method in real-world applications, showing that it can successfully generate valid linkers conditioned on target protein pockets.

1. INTRODUCTION

The space of pharmacologically-relevant molecules is estimated to exceed 10 60 structures (Virshup et al., 2013) , and searching in that space poses significant challenges for drug design. A successful approach to reduce the size of this space is to start from fragments, smaller molecular compounds that usually have no more than 20 heavy (non-hydrogen) atoms. This strategy is known as fragmentbased drug design (FBDD) (Erlanson et al., 2016) . Given a protein pocket (a part of the target protein that employs suitable properties for binding a ligand), computationally determining fragments that interact with the pocket is a cheaper and more efficient alternative to experimental high-throughput screening methods (Erlanson et al., 2016) . Once the relevant fragments have been identified and docked to the target protein, it remains to combine them into a single, connected molecule. Among various strategies such as fragment linking, merging, and growing (Lamoree & Hubbard, 2017), the former has been preferred as it allows to boost rapidly the binding energy of the target and the compound (Jencks, 1981; Hajduk et al., 1997) . This work addresses the fragment linking problem. Early computational methods for molecular linker design were based on database search and physical simulations (Sheng & Zhang, 2013), both of which are computationally intensive. Therefore, there is increasing interest for machine learning methods that can go beyond the available data and design diverse linkers more efficiently. Existing approaches are either based on syntactic pattern recognition (Yang et al., 2020) or on autoregressive models (Imrie et al., 2020; 2021; Huang et al., 2022) . While the former method operates solely on SMILES (Weininger, 1988) , the latter take into account 3D positions and orientations of the input fragments, as this information is essential for designing stable molecules in various application (see Appendix A.1 for details). However, these methods are not equivariant with respect to the permutation of atoms and can only combine pairs of fragments. Linker design depends on the target protein pocket, and using this information correctly can improve the affinity of the resulting overall compound. To date, however, there is no computational method for molecular linker design that takes the pocket into account. In this work, we introduce DiffLinker, a conditional diffusion model that generates molecular linkers for a set of input fragments represented as a 3D atomic point cloud. First, our model generates the size of the prospective linker and then samples initial linker atom types and positions from the

