MOLECULE GENERATION FOR TARGET PROTEIN BINDING WITH STRUCTURAL MOTIFS

Abstract

Designing ligand molecules that bind to specific protein binding sites is a fundamental problem in structure-based drug design. Although deep generative models and geometric deep learning have made great progress in drug design, existing works either sample in the 2D graph space or fail to generate valid molecules with realistic substructures. To tackle these problems, we propose a Fragmentbased LigAnd Generation framework (FLAG), to generate 3D molecules with valid and realistic substructures fragment-by-fragment. In FLAG, a motif vocabulary is constructed by extracting common molecular fragments (i.e., motif) in the dataset. At each generation step, a 3D graph neural network is first employed to encode the intermediate context information. Then, our model selects the focal motif, predicts the next motif type, and attaches the new motif. The bond lengths/angles can be quickly and accurately determined by cheminformatics tools. Finally, the molecular geometry is further adjusted according to the predicted rotation angle and the structure refinement. Our model not only achieves competitive performances on conventional metrics such as binding affinity, QED, and SA, but also outperforms baselines by a large margin in generating molecules with realistic substructures.

1. INTRODUCTION

Recent years have witnessed the great success of deep learning in drug design. Among the progress, deep generative models that aim to generate molecules with desirable physicochemical and pharmacological properties are of particular importance. These models range from string-based (Gómez-Bombarelli et al., 2018) and graph-based methods (Jin et al., 2018; Xie et al., 2021) to recent 3D geometry-based methods (Gebauer et al., 2019; Luo & Ji, 2021) . Molecule drugs can only affect certain biological functions and pathways by binding to the target proteins. However, the complexity of the context information, geometric constraints, and moleculeprotein interactions bring great challenges. Therefore, few deep learning models have been developed to generate molecules that bind to specific protein binding sites (a.k.a. structure-based drug design). Early attempts modify the pocket-free models by incorporating scoring functions like docking scores between generated molecules and pockets to guide the ligand generation (Li et al., 2021) . Another line of works convert the 3D pocket structures to molecular string or graph representations for conditional generation (Skalic et al., 2019; Xu et al., 2021a) . They fail to model how molecules interact with their target proteins explicitly in 3D space. Recently, a series of 3D generative models are proposed to generate 3D molecules that bind to given protein pockets (Luo & Ji, 2021; Liu et al., 2022; Peng et al., 2022) . They use 3D graph neural networks for context encoding and achieve equivariance. However, most of these works do not consider chemical priors and may generate invalid molecules with unrealistic substructures. Their atom-wise generation scheme also leads to inefficient molecule sampling. In this work, we propose a novel Fragment-based LigAnd Generation framework (FLAG) for structure-based drug design, where the molecules are generated fragment-by-fragment. To generate 3D molecules, we first preprocess the dataset and extract molecular fragments with high occurrence frequencies (i.e., motif) as "building blocks" for new molecules. At each generation step, a 3D graph neural network is first employed to encode the intermediate context information including the protein pocket and the intermediate molecular graph. Secondly, our model selects the focal motif, predicts the next motif type, and attaches the new motif to the generated molecule. Attaching motifs in 3D space is a great challenge. Inspired by the fact that the flexibility of molecular geometries lies largely in the degree of rotatable bond (A bond in a molecule is rotatable if cutting this bond creates two connected components of the molecule, each of which has at least two atoms) (Axelrod & Gomez-Bombarelli, 2022) , we employ cheminformatics tools (Bento et al., 2020) to efficiently determine the bond lengths/angles and trains neural networks to predict torsion angles. Leveraging this insight can significantly reduce the searching space of atom/motif coordinates. For example, the CrossDocked dataset (Francoeur et al., 2020) has, on average, m = 24 heavy atoms, corresponding to a 3m-dimensional euclidean space, but only around 5 torsion angles of rotatable bonds. Furthermore, the rotation angle is predicted to further adjust the geometries of generated molecules. Inspired by force fields in physics (Rappé et al., 1992) , a novel structure refinement is finally applied to optimize the molecule structures. We conduct extensive evaluations to evaluate our approach. Experimental results show that: (1) our method is able to generate diverse drug-like molecules with high binding affinity to target proteins; (2) FLAG is much faster than most of the baseline methods at sampling new molecules; (3) thanks to the design of fragment-based generation, our method outperforms baselines by a large margin on generating valid molecules with realistic substructures.

2. RELATED WORK

Motif-based Molecule Generation. To generate more valid molecules with realistic substructures, many models adopt prior knowledge of chemical motifs, also known as fragments or rationales, as building blocks to generate or optimize molecules (Jin et al., 2018; 2020a; Podda et al., 2020; Jin et al., 2020b; Chen et al., 2021a; Seo et al., 2021; Xie et al., 2021; Chen et al., 2021b; Guo et al., 2022; Flam-Shepherd et al., 2022) . For example, JT-VAE (Jin et al., 2018) first decomposes the molecular graphs into junction trees, where each node in the tree represents a substructure of the molecule. Then JT-VAE adopts the variational autoencoder as the framework and learns to reconstruct the molecular graph fragment-by-fragment. Similarly, RationaleRL extracts rationales that lead to different properties of molecules by MCTS. Then it is trained to expand rationales to complete molecular graphs with reinforcement learning. However, the aforementioned methods cannot generate 3D molecules directly and consider the complicated context information of binding sites.

3D Molecule Generation.

With the development of geometric deep learning, many recent works explore 3D molecular geometry generations with given 2D molecular graphs (Mansimov et al., 2019; Simm & Hernandez-Lobato, 2020; Luo et al., 2021b; Shi et al., 2021; Ganea et al., 2021; Xu et al., 2021b; 2022) , or from scratch (Gebauer et al., 2019; Hoogeboom et al., 2022; Nesterov et al., 2020; Gebauer et al., 2022; Luo & Ji, 2021; Satorras et al., 2021) . Comparatively, the task of structurebased drug design is more challenging. Firstly, the 2D molecular graph is unknown. Secondly, the generated molecules should fit well with the binding pockets with high binding affinity. Finally, the aforementioned works usually deal with small organic molecules and may be insufficient to generate 3D drug-like molecules with larger molecule weights. For more detailed discussions on molecule generation, we recommend readers refer to the comprehensive survey (Du et al., 2022) . Structure-based Drug Design. Structure-based drug design aims to generate 3D molecules that bind to specific binding sites. LiGAN (Ragoza et al., 2022) first approaches this problem using a conditional variational autoencoder trained on atomic density grid representations of protein-ligand structures. Then the molecular structures of ligands are constructed by atom fitting and bond inference from the generated atom densities. As a preliminary work, LiGAN employs 3D CNN as the encoder, which does not satisfy the desirable equivariance property. The follow-up works achieve equivariance by leveraging graph neural networks to encode the context information (Luo & Ji, 2021; Liu et al., 2022; Peng et al., 2022) . For example, (Luo & Ji, 2021) uses SchNet (Schütt et al., 2017) to encode the 3D context of binding sites and estimate the probability density of atom's occurrences in 3D space. The atoms are sampled auto-regressively until there is no room for new atoms. GraphBP (Liu et al., 2022) adopts the framework of normalizing flow (Rezende & Mohamed, 2015) and constructs local coordinate systems to predict atom types and relative positions. Pocket2Mol (Peng et al., 2022) adopts the geometric vector perceptrons (Jing et al., 2021) and the vector-based neural network (Deng et al., 2021) as the context encoder. It also explicitly considers the influence of chemical bonds. However, most of the above methods do not consider the prior knowledge of chemical motifs and may generate molecules with unrealistic or distorted substructures. We note that some recent works (Green et al., 2021; Powers et al., 2022) are similar to our model that adopts fragment-based methods for ligand generation/optimization. (Green et al., 2021) optimizes the binding affinity by predicting fragments to add. (Powers et al., 2022) expands a small molecule fragment into a larger drug-like molecule binding to a given protein pocket. Compared with these methods, FLAG does not require the information of starting fragments and can automatically construct the motif vocabulary. Moreover, the novel structure refinement module enables FLAG to adjust the generated molecules flexibly.

3.1. OVERVIEW

Our goal is to generate valid 3D molecular structures that can fit and bind to specific protein binding site. The 3D geometry of a molecule (i.e., ligand) can be represented as G = {(a i , r i )} n i=0 . Similarly, a binding site of protein can be defined as a set of atoms P = {(b j , s j )} m j=1 . Here, we use n and m to represent the numbers of atoms in the molecule and in the binding site, respectively. In G and P, a i and b j are one-hot vectors denoting the atom types and r i , s j ∈ R 3 indicate the 3D cartesian coordinates. Formally, our objective is to learn a conditional generative model p(G|P) to capture the conditional distribution of protein-ligand pairs. In our work, we formulate the generation of molecules in the given binding pocket as a sequential generation process. Let our generation model be ϕ and the intermediate generated molecule at the t-th step be G (t) , the generation process can be summarized as below: G (t) = ϕ(G (t-1) , P), t > 1 (1) G (1) = ϕ(P), t = 1. Different from previous works, we generate molecules motif-by-motif, i.e., a set of atoms from the new motif are included into G (t) instead of a single atom. Specifically, there are mainly four parts in one generation step, including In this section, we first introduce the motif extraction procedure in Sec. 3.2. In Sec. 3.3 and Sec. 3.4, we will describe the architecture of the encoder and the 3D molecule generation process respectively. In Sec. 3.5, we derive the training objective introduces the training scheme in detail. 

3.2. MOTIF EXTRACTION

To decompose molecules and extract motifs, a molecule can also be represented as a threedimensional graph G = (V, R, E) with V as atoms set, R as atom coordinates set, E as covalent bonds set. A motif M i = (V i , R i , E i ) is defined as a subgraph of molecule G. Given a molecule, we extract its motifs M 1 , • • • , M n such that their union covers the entire molecular graph: V = i V i , R = i R i , and E = i E i . The motif extraction mainly contains the following steps: • Firstly, extract and detach all the rotatable bonds that will not violate the chemical validity. • Then molecule G is then broken into disconnected fragments G 1 , • • • , G N . • We select G i as motif if its occurrence in the whole training set is more than τ . If G i is not selected as a motif, we further decompose it into finer rings and bonds and select them as motifs in G. After preprocessing the whole training dataset, we obtain a vocabulary of motifs V M . The motif extraction and vocabulary construction procedure are illustrated in Figure .2. In the generation process, the geometry of motifs (i.e., bond lengths and angles) can be determined by cheminformatics tools such as RDkit (Bento et al., 2020) .

3.3. ENCODER

To generate molecules conditioned on the binding pocket, it is important to capture the context information with the context encoder. In our work, a context 3D graph C (t-1) = G (t-1) P is firstly constructed by connecting atoms within certain cutoff distances. A 3D graph neural network is employed to encode C (t-1) . The first layer of the encoder is a linear layer which maps atomic attributes to initial embeddings h k . Then, we have L feature aggregation layers in our 3DGNN. The aggregation for each atom k at the l-th layer (1 ≤ l ≤ L) can be formulated as: h (t,l) k = h (t,l-1) k + u∈N (k) h (t,l-1) u ⊙ MLP l (e RBF (d uk )), where N (k) denotes the neighbors of the k-th atom in C (t-1) , MLP l (•) is a multi-layer perceptron, and ⊙ denotes the element-wise multiplication. The embedding of pairwise distance d uk is obtained with radial basis functions (e RBF (•)), such as Gaussian Functions (Schlichtkrull et al., 2018) and spherical Bessel functions (Gasteiger et al., 2019) . Since our encoder is based on the atom attributes and pairwise distances, it is rotationally and translationally invariant. Note that more advanced 3DGNNs such as DimeNet (Gasteiger et al., 2019) and SphereNet (Liu et al., 2021) can be employed as encoders in future works. We do not use them in the current version of FLAG, considering the computational efficiency and GPU memory budget.

3.4. 3D MOLECULE GENERATION

The following procedures are applied to generate the new motif in each generation step. Focal Motif Selection: To predict the next motif, we need to first select a focal motif to which the next motif attaches with. We employ two auxiliary atom-wise classifiers: protein atom classifier (for t = 1) and molecule atom classifier (for t ≥ 2) for the selection. (1) At the first step (t = 1), all the known context information is the binding pocket. The protein atom classifier takes the hidden representations of protein atoms as input, and predicts whether new ligand atoms can be generated within 4 Å. (2) For t ≥ 2, the molecule atom classifier selects a focal atom from the ligand atoms generated in the previous t -1 steps. The motif that the focal atom belongs to is chosen as the focal motif. If no atom/motif is selected as focal, the molecule generation is completed. Overall, the classifiers take the representations of atoms as input and utilize two MLPs to predict the selection probabilities. We describe how to train these two auxiliary classifiers in Sec.3.5. Next Motif Prediction: Given the focal motif M f , the label of the next motif is predicted as: q = Softmax(MLP M (Emb(M f ), i∈M f h i )) (4) where q is the distribution over the motif vocabulary V M , Emb(M f ) denotes the embedding of the focal motif, and i∈S f h i is the sum of the atom embeddings in the focal motif. Since there is no focal motif at the first step (t = 1), we regard no motif as a special motif type and also learn its embedding in training. Motif Attachments Enumeration and Prediction: With the predicted motif, the next step is to attach the new motif to the generated molecule. Note that this step is not deterministic since there are potentially many attachment configurations (See Figure .1). Our goal here is to select the most appropriate attachment. Specifically, we enumerate different valid attachments and form a candidate set C. We employ GIN (Xu et al., 2019) as the scoring function f a over the candidate molecular graphs and train it to select the most appropriate molecule attachment: G (t) = arg max G ′ ∈C f a (G ′ ). Note that by our design, any two motifs share at most two atoms, so we only need to merge at most two atoms or one bond in the process of motif attachment. By pruning chemically invalid molecules and merging isomorphic graphs, we have |C| ≈ 3 on the CrossDocked dataset. Therefore, the attachment and scoring will not be a burden on the computational efficiency. After selecting the molecule attachment graph, we also need to assemble the new motif to the intermediate molecule in the 3D space. The assembling process of the first motif is challenging since there is no reference motifs to attach with and the relative position to the binding pocket is unknown. A straightforward method is to randomly place the first motif near the focal protein atom. However, such a strategy is noisy and predicted coordinates can be far from the optimal solutions. Following (Jin et al., 2022) , we use a distance-based initialization strategy, which is more accurate and stable than random initialization. Specifically, a distance matrix D ∈ R (n ′ +m ′ )×(n ′ +m ′ ) is set as: D i,j =      ∥s i -s j ∥ i, j ≤ n ′ MLP d (h (0) i , h (0) j ) i ≤ n ′ , j > n ′ ∥r i -r j ∥ i, j > n ′ , where n ′ and m ′ denote the number of sampled protein atoms for reference and the number of atoms in the first molecular motif. The distances between the protein atoms and motif atoms can be directly calculated. For the distances between molecular and protein atoms, we use MLP d for prediction with the pairwise atom attributes as the input. With the distance matrix D, we can obtain the coordinates of atoms by eigenvalue decomposition of its Gram matrix (Crippen & Havel, 1978) : Di,j = 0.5(D 2 i,1 + D 2 1,j -D 2 i,j ), D = U SU ⊤ (7) where S is a diagonal matrix with eigenvalues in descending order. The coordinate of each atom r i is calculated as: ri = [X i,1 , X i,2 , X i,3 ], X = U √ S. (8) Note that even though predicted coordinates {r i } retain the original distance D, they are located in a different coordinate system. Therefore, we apply the Kabsch algorithm (Kabsch, 1976) to find a rigid body transformation {R, t} that aligns the predicted protein coordinates {r 1 , • • • , rn ′ } with the reference coordinates {r 1 , • • • , r n ′ }. Lastly, the coordinates of the first motif are calculated as: r i = Rr i + t, i > n ′ . ( ) For generation steps with t > 1, the coordinates of the attached motifs are determined and aligned similarly with RDkit (Bento et al., 2020) and Kabsch algorithm (Kabsch, 1976) . The atoms in the focal motif are used as the reference atoms. If the focal motif contains a rotatable bond, the new motif has an additional degree of freedom. The torsion angle should be further predicted and adjusted. Rotation Angle Prediciton: After attaching the new motif and obtaining the initial coordinates, we apply the encoder again to get the updated hidden representations. Let X, Y denote the two end atoms of the rotatable bond (specifically, let Y denote the atom connecting the new motif). The change of the torsion angle, ∆α is predicted as: ∆α = MLP α (h X , h Y , h G )mod2π, where h X and h Y indicate the embeddings of X and Y ; h G denotes the embedding of the molecule, which is obtained with a sum pooling. Since the predicted angle is based on the representations from the equivariant encoder, ∆α is also rotationally and translationally invariant. Finally, the coordinates of the atoms in the new motif are updated by rotating ∆α around line XY . The implementation details are included in Appendix A. Structure Refinement: According to the design of our generation scheme, at each step t > 1, we fix the coordinates of G (t-1) , predict and attach the new motif. However, we find the generated coordinates may be inaccurate and lead to sub-optimal generated molecules for the following reasons: (1) the distance-based initialization at t = 1 may be inaccurate because the pairwise distances are predicted independently without considering higher-order interaction terms; (2) at t > 1, we fix the coordinates of G (t-1) and ignore the interactions between G (t-1) and the new motif. Therefore, we propose to incorporate an structure refinement process at each step of generation. The refinement procedure should be equivariant as the coordinates of G (t-1) and P can be rotated and translated arbitrarily. Inspired by force fields in physics (Rappé et al., 1992) , we calculate the force between atoms. Specifically, after placing the new motif at each step t, we additionally use our encoder to compute the atom embeddings h i . The force between atoms in G (t) is calculated as: g (t) i,j = g(h (t) i , h j , e RBF (∥r i -r j ∥)) • r i -r j ∥r i -r j ∥ , where e RBF (∥r i -r j ∥) is the radial basis distance encoding, the function g is a feed-forward neural network with a scalar output, and ri-rj ∥ri-rj ∥ is the normalized verctor from the j-th atom to the i-th atom. For the The force between G (t) and P, we only calculate the alpha carbons C α in P for computational efficiency. Similarly, we have: g (t) i,k = g(h (t) i , h (t) k , e RBF (∥r i -s k ∥)) • r i -s k ∥r i -s k ∥ . Finally, the coordinates of G (t) is updated as: r (t) i ← r (t) i + 1 |G (t) | j̸ =i g (t) i,j + 1 |P Cα | k g (t) i,k , where |G (t) | denotes the number of molecular atoms at step t; |P Cα | denotes the number of C α atoms in the binding pocket. In the Appendix B, we perform ablation studies to show the effectiveness of structure refinement.

3.5. TRAINING

In the training stage, we first extract motifs for molecules with the method described in Sec.3.2. we randomly mask the motifs of molecules and train the model to recover the masked ones. Specifically, for each pocket-ligand pair, we sample a mask ratio from the uniform distribution U [0, 1] and mask the corresponding number of structural motifs. The motifs are generated in a breadth-first order and the root motif is set as the motif closest to the pocket. The atoms that have valence bonds to the masked motifs are defined as focal atom candidates. If all molecular atoms are masked, the focal atoms are defined as protein atoms that have masked ligand atoms within 4 Å. For the motif type prediction, we use cross-entropy loss for the classification, denoted as L motif . For the focal atom/motif prediction, we use a binary cross entropy loss L f ocal for the classification of focal atoms. For the distance prediction and structure refinement, we minimize an MSE loss L d with respect to the pairwise distances. We add Gaussian noise to the ground truth coordinates in the training stage and refine the structure with the predicted force. For the motif attachment, we maximize the log-likelihood of predicting the correct molecular graph: L attach = -f a (G t ) + log G ′ ∈C/Gt exp(f a (G ′ )), where G t denotes the ground truth and C is the set of possible candidates. As for the torsion angle prediction, we fit angles with von Mises distributions with L α similar to (Senior et al., 2020) . In the training process, we aim to minimize the sum of the above loss functions: L = L motif + L f ocal + L attach + L d + L α . ( ) 4 EXPERIMENTS

4.1. EXPERIMENTAL SETTINGS

Dataset: Following (Luo et al., 2021a) and (Peng et al., 2022) , we use the CrossDocked dataset (Francoeur et al., 2020) which contains 22.5 million protein-molecule structures. We filter out data points whose binding pose RMSD is greater than 1 Å and molecules that can not be sanitized with RDkit (Bento et al., 2020) , leading to a refined subset with around 160k data points. We use mm-seqs2 (Steinegger & Söding, 2017) to cluster data at 30% sequence identity, and randomly draw 100,000 protein-ligand pairs for training and 100 proteins from remaining clusters for testing. For evaluation, we randomly sample 100 molecules for each protein pocket in the test set. Baselines: We compare FLAG with four state-of-the-art baseline methods: one method based on 3D CNNs (LiGAN) (Ragoza et al., 2022) and another three methods based on 3D GNNs (AR, GraphBP, and Pocket2Mol) (Luo et al., 2021a; Liu et al., 2022; Peng et al., 2022) .

Model:

The number of layers L in context encoder is 6, and the hidden dimension is 256. The model is trained with the Adam optimizer at a learning rate of 0.0001. The batch size is 4 and the number of total training iterations is 1,000,000. The standard deviation of the added Gaussian noise to the ligand coordinates is 0.2. The cutoff distance in the context encoder is set to 10 Å. The threshold τ in motif extraction is set to 100 in the default setting. Metrics: We choose metrics that are widely used in previous works (Luo et al., 2021a; Peng et al., 2022; Liu et al., 2022) to evaluate the qualities of the sampled molecules: (1) Vina Score measures the binding affinity between the generated molecules and the protein pockets; (2) High Affinity is calculated as the percentage of pockets whose generated molecules have higher affinity to the references in the test set; (3) QED measures how likely a molecule is a potential drug candidate; (4) Synthesizability (SA) represents the difficulty of drug synthesis (the score is normalized between 0 and 1 and higher values indicate easier synthesis); ( 5) LogP is the octanol-water partition coefficient (LogP values should be between -0.4 and 5.6 to be good drug candidates (Ghose et al., 1999) ); ( 6) Lipinski (Lip.) calculates how many rules the molecule obeys the Lipinski's rule of five (Lipinski et al., 2012) ; ( 7) Sim. Train represents the Tanimoto similarity (Bajusz et al., 2015) with the most similar molecules in the training set; (8) Diversity (Div.) measures the diversity of generated molecules for a binding pocket (It is calculated as 1 -average pairwise Tanimoto similarities). ( 9) Time records the time cost to generate 100 valid molecules for a pocket. In our work, the Vina Score is calculated by QVina (Trott & Olson, 2010; Alhossary et al., 2015) and the chemical properties are calculated by RDKit (Bento et al., 2020) over the valid molecules. Before feeding to Vina, the generated molecular structures are firstly refined by universal force fields (Rappé et al., 1992) .

4.2. RESULTS

In Table . 1, we show the mean values along with standard deviations of the above metrics. Generally, our method achieves competitive or better performance compared with the baseline methods. Even though our method is not explicitly optimized for the binding affinity, FLAG succeeds at generating molecules with higher affinity than reference molecules over 58% of the binding site, which is quite impressive. When it comes to drug potentials (QED, SA, LogP, and Lipinski), we observe that FLAG obtains the best result on SA and Lipinski, which indicates that molecules generated by FLAG are more likely to be drug candidates. Moreover, our model manages to generate diverse molecules with the lowest similarities to the molecules in the training dataset. This shows that our model can generalize to new protein pockets and does not just memorize the training dataset. Table .1 also shows the generation time comparisons among 3D GNN-based methods. Our model has comparable sampling efficiency with GraphBP, and is much faster than AR and Pocket2Mol. This is because our method generates molecules fragment-by-fragment and has much fewer generation steps. Due to page limits, more discussions of molecule generations are shown in Appendix B. In Figure 3 , we provide two case studies and show several examples of generated 3D molecules that have a higher binding affinity (lower vina scores) than their corresponding reference molecules. Firstly, we observe that our generated molecules with higher affinity are largely different from reference molecules in structure. This further validates that our model can generate diverse and novel molecules to bind target proteins, instead of just imitating or modifying reference molecules. Moreover, the QED and SA scores are comparable or even higher than the reference molecules. 

4.3. SUBSTRUCTURE ANALYSIS

Substructure analysis is needed to further evaluate the generated molecules Peng et al. (2022) . In Table . 2, we measure the Kullback-Leibler (KL) divergence between the distributions of the bond angles/ dihedral angles in the test set and the generated molecules by different methods. The lower the values, the better the methods can capture the distributions of the dataset. In Table . 2, we can observe that our method has much lower KL divergences than the baseline methods over nearly all the angle types. This demonstrates that our method can keep the geometric attributes of the data well and generate molecules with realistic substructures. This again shows the advantage of the fragment-wise generation scheme. To further investigate how well our model learns the distribution of substructures, we show the bond length distributions (C-O single bond), bond angle distributions (CCO chains), and two sets of dihedral angle distributions (CCCC and CCCO chains) in Figure .4. We have the following observations: Firstly, the distributions of bond length/angle show clear clusters, indicating the values of bond length/angle are largely fixed. This also verifies the rationality to use chemical priors and tools to determine the bond lengths/angles. Secondly, the distributions of dihedral angles are more complex and it is difficult to pre-specify them when attaching new motifs. Therefore, we predict and adjust the torsion angle based on the context information in FLAG. Finally, the distributions of generated molecules by FLAG align well with the training set, indicating that the bond distances/angles and dihedral angles are accurately modeled and reproduced. 

5. CONCLUSION

= (r A -r B )•(r C -r B ) ∥r A -r B ∥•∥r C -r B ∥ .

A.3 DEFINITION OF DIHEDRAL ANGLE

The count-clockwise (CCW) dihedral angle between two intersecting half-planes ABC and ABD with an common edge AB is calculated asfoot_4 : ∠(ABC, ABD) def = atan2(∥b 2 ∥⟨b 1 , b 2 × b 3 ⟩, ⟨b 1 × b 2 , b 2 × b 3 ⟩), where b 1 =r A -r C , b 2 = r B -r A , b 3 = r D -r B .

A.4 CANONICAL DEFINITION OF TORSION ANGLE

As introduced by (Ganea et al., 2021) , the torsion angle can be defined in a canonical way. Let X, Y be the coordinates of the two end atoms and {P i }, {Q j } be the coordinates of their neighbors respectively. Let ∆ ij def = ∠(XY P i , XY Q j ) denote the counter-clockwise dihedral angle. s ij def = [cos(∆ ij ), sin(∆ ij )] T . Let c ij be real coefficients such that s def = i,j c ij s ij ∈ R 2 is not a null vector. The torsion angle α is defined: α def = atan2( s ∥s∥ ). In our method, instead of predicting the exact value of α, we predict the change of the torsion angle ∆α and update the coordinates by the rotation matrix described in Appendix A.6.

A.5 CALCULATION OF KL DIVERGENCE

To calculate two distributions of bond/dihedral angles, we set one degree as a bin and calculate the normalized frequencies (180 bins for bond angles and 360 bins for dihedral angles). The KL divergence between two distributions p(θ) and q(θ) is calculated as: KL = i p(θ i ) • log( p(θ i ) q(θ i ) ). A.6 DETAILS OF MOTIF ROTATION In the generation of new motifs, if the focal motif is rotatable and the rotation angle ∆α is known, we use the following rotation matrix R 3×3 and the translation vector t 3×1 to update the coordinates of the new motif. Let X, Y denote the two end atoms of the rotatable bond (Y denote the atom connecting the new motif) and r X and r Y be their coordinates. Let n denotes the normalized directional vector r Y -r X ∥r Y -r X ∥ and n x , n y and n z be its components along the x, y and z axis. Let x 0 , y 0 , and z 0 be the three components of r X . The rotation matrix and translation vector are: R 3×3 =   n 2 x K + cos(∆α) n x n y K -n z sin(∆α) n x n z K + n y sin(∆α) n x n y K + n z sin(∆α) n 2 y K + cos(∆α) n y n z K -n x sin(∆α) n x n z K -n y sin(∆α) n y n x K + n x sin(∆α) n 2 z K + cos(∆α)   , (18) t 3×1 = (x 0 -n x M )K + (n z y 0 -n y z 0 )sin(∆α) (y 0 -n y M )K + (n x z 0 -n z x 0 )sin(∆α) (z 0 -n z M )K + (n y x 0 -n x y 0 )sin(∆α) , where K = 1 -cos(∆α) and M = n x x 0 + n y y 0 + n z z 0 . The coordinates r i in the motif are updated as: r ′ i = Rr i + t

A.7 MORE DETAILS OF MOTIF ATTACHMENT

To determine the 3D coordinates of the new motif and attach it to the generated molecule, we leverage RDkit and Kabsch algorithm. The Kabsch algorithm (Kabsch, 1976 ) is a widely-used method to calculate the optimal rotation matrix that minimizes the RMSD (root mean squared deviation) between two paired sets of points. The atoms in the focal motif are selected as the reference atoms. We first use RDkit to get the 3D coordinates of the new motif and reference atoms. Since the coordinates of the new motif and the intermediate molecule G (t-1) are located in different coordinate frames, we employ Kabsch algorithm to align them. Finally, we update the molecule coordinates by appending the coordinates of the new motif.

B.1 MORE ANALYSIS OF SUBSTRUCTURES

In Table .3, we show the ratios of the molecules containing different rings in the test set and the generated molecules by different methods. We can observe that the molecules generated by FLAG have the most similar ratios of rings of different sizes with the test set. This again shows the advantage of FLAG to learn and reproduce the distributions of substructures in the dataset.

B.2 ABLATION STUDIES

In Table .4, we show the results of ablation studies regarding the structure refinement module (FLAG w/o SR) and the torsion angle module (FLAG w/o T). Specifically, in FLAG w/o SR, we remove the structure refinement in each generation step; in FLAG w/o T, we do not further rotate the motif for adjustment. In Table .4, we can observe that FLAG achieves the best performance on binding affinity and drug-likeness metrics (QED, SA, LogP, Lip.), verifying the effectiveness of the two modules.



https://github.com/mattragoza/LiGAN https://github.com/luost26/3D-Generative-SBDD https://github.com/divelab/GraphBP https://github.com/pengxingang/Pocket2Mol https://en.wikipedia.org/wiki/Dihedral_angle



Figure 1: An illustration of one generation step of FLAG. There are mainly four parts whose details are shown in Sec.3.

(a) context encoding and focal motif selection, (b) next motif prediction, (c) motif attachments enumeration and selection, and (d) rotation angle prediction and structure refinement (Figure.1). The details are demonstrated in the rest part of this section.

Figure 2: Illustration of Motif Extraction. More details are shown in Sec.3.2.

Figure 3: Examples of generated 3D molecules that have higher binding affinity than reference molecules. Lower Vina score indicates higher binding affinity.

Figure 4: Distributions for C-O bond lengths (a), CCO bond angles (b), CCCC dihedral angles (c), and CCCO dihedral angles (d) in the training set and generated molecules.

Comparing the molecular properties of the test set and the generated molecules by different methods. The best results are bolded.

The KL divergence of the bond angles (upper part) and dihedral angles (lower part) between the test set and the generated molecules. The lower letters represent the atoms in the aromatic rings.

To implement the baselines including LiGAN 1 , AR 2 , GraphBP 3 , and Pocket2Mol 4 , we use the open-source codes and follow their default settings. The methods are trained on the same data split for a fair comparison. All the experiments are conducted on Ubuntu Linux with V100 GPUs. -r B ∥, where r A and r B denote the 3D coordinates. For a bonded chain ABC, the bond angle is defined as cos∠ABC

The ratio of the molecules containing different rings in the test set and those generated by different methods. The results closest to the ratios of the test set are bolded.

ACKNOWLEDGEMENTS

This research was partially supported by a grant from the National Natural Science Foundation of China (Grant No. 61922073).

availability

Our code is publicly available at https://github.com/zaixizhang

annex

Published as a conference paper at ICLR 2023 Table 4 : Ablation studies of the structure refinement module and the torsion angle module. FLAG w/o SR and FLAG w/o T denotes removing the structure refinement and torsion angle prediction and rotation procedure respectively.

Methods

Vina Score (kcal/mol, ↓) Moreover, with these two modules, FLAG can generate more unique and diverse molecules. Admittedly, structure refinement and torsion prediction bring more computational overhead. However, the overhead is acceptable considering the gain in generation performance.

9LQD6FRUHNFDOPRO

)/$* )/$*ZR65 )/$*ZR7 Here, we mainly discuss the influence of hyperparameter τ on the generation performance. In Fig- ure .5, we vary τ from 0 to 500 and show the average Vina scores of the generated molecules by FLAG and its variants. Lower vina scores indicate higher binding affinity and better generation quality. We have the following observations. Firstly, our methods are robust to the choice of τ . For example, FLAG has the largest vina score when τ = 0. The value is only 0.2 larger than the lowest vina score at τ = 100. Moreover, an appropriate value τ is important for FLAG to achieve the best performance. Lower τ typically indicates a larger size of motif vocabulary with redundant motifs.On the other hand, larger τ indicates a smaller size of motif vocabulary with finer extracted motifs (e.g., single bonds), which may lead to too many generation steps and invalid generated molecules. Finally, an interesting observation is that FLAG w/o T has the best performance at τ = 500. This is probably due to the reason that finer motifs offer more flexibility when the torsion angle module is removed in FLAG.

B.4 GENERATED MOLECULES WITH LOWER AFFINITIES THAN THE REFERENCE

In Figure . 6, we show a pocket (3tym) where FLAG fails to generate molecules with higher affinities than the reference. FLAG fails to generate molecules with higher binding affinities probably due to the following reasons: (1) The geometry of this binding pocket (3tym) is more complex and some generated molecules fail to occupy the whole pocket like the reference molecule.(2) Some generated molecules accidentally collide with the pocket, which is unrealistic in nature.Ours (3u9f)

