ROTAMER DENSITY ESTIMATOR IS AN UNSUPER-VISED LEARNER OF THE EFFECT OF MUTATIONS ON PROTEIN-PROTEIN INTERACTION

Abstract

Protein-protein interactions are crucial to many biological processes, and predicting the effect of amino acid mutations on binding is important for protein engineering. While data-driven approaches using deep learning have shown promise, the scarcity of annotated experimental data remains a major challenge. In this work, we propose a new approach that predicts mutational effects on binding using the change in conformational flexibility of the protein-protein interface. Our approach, named Rotamer Density Estimator (RDE), employs a flow-based generative model to estimate the probability distribution of protein side-chain conformations and uses entropy to measure flexibility. RDE is trained solely on protein structures and does not require the supervision of experimental values of changes in binding affinities. Furthermore, the unsupervised representations extracted by RDE can be used for downstream neural network predictions with even greater accuracy. Our method outperforms empirical energy functions and other machine learning-based approaches.

1. INTRODUCTION

Proteins rarely act alone and usually interact with other proteins to perform a diverse range of biological functions (Alberts & Miake-Lye, 1992; Kastritis & Bonvin, 2013) . For example, antibodies, a type of immune system protein, recognize and bind to proteins on pathogens' surfaces, eliciting immune responses by interacting with the receptor protein of immune cells (Lu et al., 2018) . Given the importance of protein-protein interactions in many biological processes, developing methods to modulate these interactions is critical. A common strategy to modulate protein-protein interactions is to mutate amino acids on the interface: some mutations enhance the strength of binding, while others weaken or even disrupt the interaction (Gram et al., 1992; Barderas et al., 2008) . Biologists may choose to increase or decrease binding strength depending on their specific goals. For example, enhancing the effect of a neutralizing antibody against a virus usually requires increasing the binding strength between the antibody and the viral protein. However, the combinatorial space of amino acid mutations is large, so it is not always feasible or affordable to conduct wet-lab assays to test all viable mutations. Therefore, computational approaches are needed to guide the identification of desirable mutations by predicting their mutational effects on binding strength, typically measured by the change in binding free energy (∆∆G). Traditional computational approaches are mainly based on biophysics and statistics (Schymkowitz et al., 2005; Park et al., 2016; Alford et al., 2017) . Although these methods have dominated the field for years, they have several limitations. Biophysics-based methods face a trade-off between efficiency and accuracy since they rely on sampling from energy functions. Statistical methods are more efficient, but their capacity is limited by the descriptors considered in the model. Furthermore, both biophysics and statistics-based methods heavily rely on human knowledge, preventing it to benefit from the growing availability of protein structures. As a result, predicting the effects of mutations on protein-protein binding remains an open problem. Recently, deep learning has shown significant promise in modeling proteins, making data-driven approaches more attractive than ever (Rives et al., 2019; Jumper et al., 2021) . However, developing deep learning-based models to predict mutational effects on protein-protein binding is challenging due to the scarcity of experimental data. Only a few thousand protein mutations, annotated with changes in binding affinity, are publicly available (Geng et al., 2019b) , making supervised learning challenging due to the potential for overfitting with insufficient training data. Another difficulty is the absence of the structure of mutated protein-protein complexes. Mutating amino acids on a protein complex leads to changes mainly in sidechain conformations (Najmanovich et al., 2000; Gaudreault et al., 2012) , which contribute to the change in binding free energy. However, the exact conformational changes upon mutation are unknown. In this work, we draw inspiration from the thermodynamic principle that protein-protein binding usually leads to entropy loss on the binding interface, which can be used to determine binding affinity (Brady & Sharp, 1997; Cole & Warwicker, 2002; Kastritis & Bonvin, 2013) . When two proteins bind, the residues located at the interface tend to become less flexible (i.e. having lower entropy) due to the physical and geometric constraints imposed by the binding partner (Figure 1 ). A higher amount of entropy loss corresponds to a stronger binding affinity. Therefore, by comparing the entropy losses of wild-type and mutated protein complexes, we can estimate the effect of mutations on binding affinity. Please refer to Section B in the appendix for a detailed discussion. Based on this principle, we introduce a novel approach to predict the impact of amino acid mutations on proteinprotein interaction. The core of our method is Rotamer Density Estimator (RDE), a conditional generative model that estimates the density of amino acid sidechain conformations (rotamers). We use the entropy of the estimated density as a metric of conformational flexibility. By subtracting the entropy of the separated proteins from the entropy of the complex, we obtain an estimation of binding affinity. Finally, we can assess the effect of mutations by comparing the estimated binding affinities of wildtype and mutant protein complexes. In addition to directly comparing entropy, we also employ neural networks to predict ∆∆G from the representations learned by RDE. Our method is an attempt to address the aforementioned challenges. Rotamer Density Estimator is trained solely on protein structures, not requiring other labels, making it an unsupervised learner of the mutation effect on protein-protein interaction. This feature mitigates the challenge posed by the scarcity of annotated mutation data. Moreover, our method does not require the mutated protein structure as input. Instead, it treats mutated structures as latent variables, which are approximated by RDE. Our method outperforms both empirical energy functions and machine learning models for predicting ∆∆G. Additionally, as a generative model for rotamers, RDE accurately predicts sidechain conformations. trade-off between speed and accuracy. Their performance depends heavily on human expertise, which limits their pace to improve with the fast-growing of available protein structures. Recently, deep learning-based approaches have emerged. We group them into three categories: endto-end models, pre-training-based models, and unsupervised models. End-to-end models directly predict the difference in binding free energy by taking both mutant and wild-type protein structures as input (Shan et al., 2022) . Pre-training-based models attempt to address data scarcity by pre-training a feature extraction network (Liu et al., 2021; Yang et al., 2022; Zhang et al., 2022) . However, most pre-training tasks are not designed to capture the foundation of protein-protein interactions. Unsupervised models adopt the mask-predict paradigm to protein 3D structures, partially masking amino acid types on a given protein backbone, and recovering the masked information using neural networks (Wang et al., 2018; Shroff et al., 2020; Jing et al., 2020; Yang et al., 2022; Hsu et al., 2022) . These models can serve as unsupervised predictors of the mutational effects on binding, as the difference in the probability of amino acid types before and after mutation correlates mildly to the change in binding free energy.

2.2. MUTATIONAL EFFECT PREDICTION FOR SINGLE PROTEINS

The prediction of mutational effects for single proteins can be achieved using either structure-based or sequence-based (evolution-based) approaches. Structure-based methods can be categorized into biophysical, statistical, and deep learning-based methods, which aim to predict the thermal stability or fitness of the protein rather than the binding free energy between proteins (Schymkowitz et al., 2005; Park et al., 2016; Alford et al., 2017; Lei et al., 2023) . Sequence-based methods rely on the mining of evolutionary history, done by performing statistics on multiple sequence alignments (MSAs) constructed from large-scale sequence databases (Hopf et al., 2017; Riesselman et al., 2018; Rao et al., 2021; Luo et al., 2021; Frazer et al., 2021) , or leveraging protein language models (PLMs) (Meier et al., 2021; Notin et al., 2022) . However, it is important to note that sequence-based methods are not suitable for predicting mutational effects on general protein-protein interactions due to the lack of evolutionary information in many cases. Protein-protein interactions typically involve two or more chains, which may belong to different species or may not experience inter-chain co-evolution. As such, it is infeasible to predict mutational effects via mining sequence databases using existing powerful tools such as MSAs or PLMs. Thus, effective ways for predicting mutational effects on protein-protein interaction rely on structure-based approaches rather than sequences alone. We present a detailed discussion and supporting experimental results in Section C.1 of the appendix.

3. METHOD

3.1 OVERVIEW AND PRELIMINARIES Overview Our method comprises three main components. The first is Rotamer Density Estimator (RDE), which is a conditional normalizing flow that models the probability density of sidechain conformations (rotamers) based on the amino acid type and backbone structures (Section 3.2). The second component is an algorithm that estimates the entropy of the distribution parameterized by the normalizing flow (Section 3.3). Lastly, we describe how we use the entropy of the protein-protein interfaces in both the mutated and wild-type states, both bound and unbound, to estimate the change in binding free energy (∆∆G). We also detail how we use neural networks to achieve more accurate predictions of ∆∆G using the unsupervised representations from RDE (Section 3.4). Definitions and Notations A protein-protein complex is a multi-chain protein structure that can be divided into two groups. Each group contains at least one protein chain and each chain consists of multiple (amino acid) residues. For a protein complex containing n residues, we number them from 1 to n. The two groups of the complex can be represented by two disjoint sets of indices A, B ⊂ {1 . . . n}. A residue is characterized by its type, position, orientation, and sidechain conformation. We denote the type, position, and orientation of the i-th (i ∈ {1 . . . n}) residue as a i ∈ {1 . . . 20}, p i ∈ R 3 , and O i ∈ SO(3) respectively. The sidechain conformation of the residue is called rotamer. As the conformational degree of freedom of the sidechain is defined by rotatable bonds, a rotamer can be parameterized by torsional angles w.r.t. the rotatable bonds. The number of torsional angles varies between 0 to 4 depending on the residue type. For a residue with d torsional angles, we denote the k-th (k ∈ {1 . . . 4}) torsional angle by χ (k) i ∈ [0, 2π). Collectively, all the torsional angles are denoted by a vector χ i = (χ (k) i ) d k=1 . Using the language of geometry, an angle can be represented by a point on the unit circle S 1 . A vector consisting of d angular values resides on the product of d unit circle, known as the d-dimensional torus T D = (S 1 ) D . In this work, our first goal is to model the conditional probability density of rotamers, given the type, position, orientation, and prior rotamer of itself and other residues: p(χ i | {a j , p j , O j , χj } n j=1 ). The prior rotamers χj are often inaccurate or unknown. For example, if we mutate some residues, the rotamers of the mutated residues are unknown, and the rotamers of residues nearby the mutated ones are inaccurate because they are affected by the mutation. The probability density is defined over the d-dimensional torus T D = (S 1 ) D , and we describe below the flow-based architecture to model the density.

3.2. ROTAMER DENSITY ESTIMATOR

Rotamer Density Estimator (RDE) is designed to estimate the conditional distribution of the rotamer of the i-th residue, given the information of itself and other residues: p(χ i | {a j , p j , O j , χj } n j=1 ). In this section, we first introduce the encoder network that produces hidden representations for each residue, taking into account its environment {a i , p i , O i , χi } n i=1 . Next, we present a conditional normalizing flow defined on S 1 for rotamers with only 1 torsional angle. Based on the S 1 flow, we further extend the flows to d-dimensional torus T D (D > 1) for rotamers with more than 1 torsional angle. Encoder Network The encoder network starts with two multi-layer perceptrons (MLPs) that generate embeddings for each individual single residue and each pair of residues respectively. The MLP for single residues encodes the residue type, backbone dihedral angles, and local atom coordinates into a vector e i (i = 1 . . . n). The other MLP for residue pairs encodes the distance and the relative position between two residues. We denote a pair embedding vector as z ij (i, j = 1 . . . n). To transform the single embeddings e i and pair embeddings z ij into hidden representations h i , we use a self-attention-based network that is invariant to rotation and translation (Jumper et al., 2021) . The hidden representation h i aims to capture both the information of the i-th residue itself and its structural environment. It serves as an encoding of the condition {a j , p j , O j , χj } n j=1 for the probability density with respect to χ i . Conditional Flow on S 1 To model the distribution of a rotamer with 1 torsional angle, we utilize conditional normalizing flows on S 1 (Rezende et al., 2020). A normalizing flow is a bijective function, and to construct one on S 1 , we parameterize it by an angle θ ∈ [0, 2π], and define a bijective function on [0, 2π] that is equivalent to a bijective over S 1 . A common method for constructing a bijective function is to ensure strict monotonicity, which we adopt by constructing a strictly monotonically increasing function on [0, 2π], denoted by f : [0, 2π] → [0, 2π]. Notably, due to the periodicity of angular values, 0 and 2π are the same point, so to preserve continuity at both ends, we must ensure that f (0) = 0, f (2π) = 2π, and f ′ (0) = f ′ (2π). To achieve this, we use the rational quadratic spline flow (Durkan et al., 2019; Rezende et al., 2020) , a piece-wise function that contains K pieces delimited by K + 1 knots. Each piece takes the form: f k (x|x k,k+1 , y k,k+1 , δ k,k+1 ) = y k + (y k+1 -y k ) s k ξ 2 k (x) + δ k (1 -ξ k (x)) ξ k (x) s k + [δ k+1 + δ k -2s k ] (1 -ξ k (x)) ξ k (x) , where s k = y k+1 -y k x k+1 -x k , and ξ(x) = x -x k x k+1 -x k (x ∈ [x k , x k+1 ]), where the spline is parameterized by the coordinates and derivatives of the K + 1 knots denoted by x k , y k , and δ k (k = 1 . . . K + 1). To fulfill the requirement of monotonicity, continuity, and periodicity, we impose the following constraints on the knots: i. 0 = x 1 < x 2 < . . . < x K+1 = 2π, ii. 0 = y 1 < y 2 < . . . < y K+1 = 2π, iii. δ k > 0(k = 1 . . . K + 1), and iv. δ 1 = δ K+1 . These parameters are produced by transforming the hidden representation h i of residues using neural networks, so the probability distribution defined by the bijective is conditional on the residue and its environment. Therefore, we may also denote the spline by f (x|h i ). We choose the uniform distribution on [0, 2π] as the base distribution, represented by p z (z) = 1 2π (z ∈ [0, 2π]). The mapping from the target distribution to the base distribution is denoted by f . According to the change-of-variable formula, the target rotamer density of the i-th residue is given by:  log p(x|h i ) = log p z (f (x)) + log |f ′ (x|h i )| = -log 2π + log |f ′ (x|h i )| . = " > A A A C F n i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y U o i 4 L b l x W 6 A s 6 Y 8 m k m T Y 0 y Q x J R i j D / I a 4 0 y 9 x J 2 7 d + i H u z b S z s K 0 H A o d z 7 s 0 9 n C B m V G n H + b Z K W 9 s 7 u 3 v l f f v g 8 O j 4 p H J 6 1 l N R I j H p 4 o h F c h A g R R g V p K u p Z m Q Q S 4 J 4 w E g / m N 3 l f v + J S E U j 0 d H z m P g c T Q Q N K U b a S J 7 H k Z 4 G Q d r J H h u j S t W p O w v A T e I W p A o K t E e V H 2 8 c 4 Y Q T o T F D S g 1 d J 9 Z + i q S m m J H M 9 h J F Y o R n a E K G h g r E i f L T R e Y M 1 o w y h m E k z R M a L t S / G y n i S s 1 5 Y C b z j G r d y 8 X / v G G i w 1 s / p S J O N B F 4 e S h M G N Q R z A u A Y y o J 1 m x u C M K S m q w Q T 5 F E W J u a b L s G c a J 0 x G H x 3 + r h g G e 2 6 c l d b 2 W T 9 B p 1 9 7 r e f G h W W 0 7 R W B l c g E t w B V x w A 1 r g H r R B F 2 A Q g 2 f w C t 6 s F + v d + r A + l 6 M l q 9 g 5 B y u w v n 4 B x 4 S e + g = = < / l a t e x i t > T 2 Rotamer Distribution on < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 p U 4 A 4 N h 9 R X + r j / k M C N L k P x n T a 0  = " > A A A C F n i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y U o i 4 L b l x W 6 A s 6 Y 8 m k m T Y 0 y Q x J R i j D / I a 4 0 y 9 x J 2 7 d + i H u z b S z s K 0 H A o d z 7 s 0 9 n C B m V G n H + b Z K W 9 s 7 u 3 v l f f v g 8 O j 4 p H J 6 1 l N R I j H p 4 o h F c h A g R R g V p K u p Z m Q Q S 4 J 4 w E g / m N 3 l f v + J S E U j 0 d H z m P g c T Q Q N K U b a S J 7 H k Z 4 G Q d r J H h u j S t W p O w v A T e I W p A o K t E e V H 2 8 c 4 Y Q T o T F D S g 1 d J 9 Z + i q S m m J H M 9 h J F Y o R n a E K G h g r E i f L T R e Y M 1 o w y h m E k z R M a L t S / G y n i S s 1 5 Y C b z j G r d y 8 X / v G G i w 1 s / p S J O N B F 4 e S h M G N Q R z A u A Y y o J 1 m x u C M K S m q w Q T 5 F E W J u a b L s G c a J 0 x G H x 3 + r h g G e 2 6 c l d b 2 W T 9 B p 1 9 7 r e f G h W W 0 7 R W B l c g E t w B V x w A 1 r g H r R B F 2 A Q g 2 f w C t 6 s F + v d + r A + l 6 M l q 9 g 5 B y u w v n 4 B x 4 S e + g = = < / l a t e x i t > T 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x p 1 / i T t z 6 B 3 6 I e 9 N 2 F r b 1 Q O B w z n 3 l h A l n 2 n j e t 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T l o 5 T R W i T x D x W n R B r y p m k T c M M p 5 1 E U S x C T t v h + H b m t 5 + o 0 i y W D 2 a S 0 E D g o W Q R I 9 h Y q d U j I 9 b 3 + 6 W y V / X m Q O v E z 0 k Z c j T 6 p Z / e I C a p o N I Q j r X u + l 5 i g g w r w w i n U 7 e X a p p g M s Z D 2 r V U Y k F 1 k M 2 v n a K K V Q Y o i p V 9 0 q C 5 + r c j w 0 L r i Q h t p c B m p F e 9 m f i f 1 0 1 N d B N k T C a p o Z I s F k U p R y Z G s 6 + j A V O U G D 6 x B B P F 7 K 2 I j L D C x N i A X L e C S K p N L F A + b 3 l x K K a u z c l f T W W d t C 6 r / l W 1 d l 8 r 1 7 0 8 s S K c w T l c g A / X U I c 7 a E A T C D z C M 7 z C m / P i v D s f z u e i t O D k P a e w B O f r F 6 h / n D g = < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " o I Z U 3 V w 5 I C W k M w b v u Y O R v o t 3 t E s = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z J T i r o s u H F Z w T 6 g H U o m z b S x S W Z I M k I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L L T 1 Q O B w z n 3 l h A l n 2 n j e l 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G b u d x 6 p 0 i y W 9 2 a a 0 E D g k W Q R I 9 h Y q d 0 n Y z a o D U p l r + o t g N a J n 5 M y 5 G g O S t / 9 Y U x S Q a U h H G v d 8 7 3 E B B l W h h F O Z 2 4 / 1 T T B Z I J H t G e p x I L q I F t c O 0 M V q w x R F C v 7 p E E L 9 X d H h o X W U x H a S o H N W K 9 6 c / E / r 5 e a 6 D r I m E x S Q y V Z L o p S j k y M 5 l 9 H Q 6 Y o M X x q C S a K 2 V s R G W O F i b E B u W 4 F k V S b W K B 8 3 t / F o Z i 5 N i d / N Z V 1 0 q 5 V / c t q / a 5 e b n h 5 Y k U 4 g 3 O 4 A B + u o A G 3 0 I Q W E H i A J 3 i B V + f Z k = " > A A A C C 3 i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L h s w T 6 g H U o m z b S h y W R I M k I Z + g X i T r / E n b j 1 I / w Q 9 6 b t L G z r g c D h n P v K C R P O t P G 8 b 6 e w t b 2 z u 1 f c d w 8 O j 4 5 P S q d n b S 1 T R W i L S C 5 V N 8 S a c h b T l m G G 0 2 6 i K B Y h p 5 1 w c j / 3 O 0 9 U a S b j R z N N a C D w K G Y R I 9 h Y q R k N S m W v 6 i 2 A N o m f k z L k a A x K P / 2 h J K m g s S E c a 9 3 z v c Q E G V a G E U 5 n b j / V N M F k g k e 0 Z 2 m M B d V B t j h 0 h i p W G a J I K v t i g x b q 3 4 4 M C 6 2 n I r S V A p u x X v f m 4 n 9 e L z X R X Z C x O E k N j c l y U Z R y Z C S a / x o N m a L E 8 K k l m C h m b 0 V k j B U m x m b j u h V E U m 2 k Q P m 8 1 c W h m L k 2 J 3 8 9 l U 3 S v q 7 6 N 9 V a s 1 a u e 3 l i R b i A S 7 g C H 2 6 h D g / Q g B Y Q o P A M r / D m v D j v z o f z u S w t O H n P O a z A + f o F D w S a T A = = < / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x p 1 / i T t z 6 B 3 6 I e 9 N 2 F r b 1 Q O B w z n 3 l h A l n 2 n j e t 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T l o 5 T R W i T x D x W n R B r y p m k T c M M p 5 1 E U S x C T t v h + H b m t 5 + o 0 i y W D 2 a S 0 E D g o W Q R I 9 h Y q d U j I 9 b 3 + 6 W y V / X m Q O v E z 0 k Z c j T 6 p Z / e I C a p o N I Q j r X u + l 5 i g g w r w w i n U 7 e X a p p g M s Z D 2 r V U Y k F 1 k M 2 v n a K K V Q Y o i p V 9 0 q C 5 + r c j w 0 L r i Q h t p c B m p F e 9 m f i f 1 0 1 N d B N k T C a p o Z I s F k U p R y Z G s 6 + j A V O U G D 6 x B B P F 7 K 2 I j L D C x N i A X L e C S K p N L F A + b 3 l x K K a u z c l f T W W d t C 6 r / l W 1 d l 8 r 1 7 0 8 s S K c w T l c g A / X U I c 7 a E A T C D z C M 7 z C m / P i v D s f z u e i t O D k P a e w B O f r F 6 h / n D g = < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " o I Z U 3 V w 5 I C W k M w b v u Y O R v o t 3 t E s = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z J T i r o s u H F Z w T 6 g H U o m z b S x S W Z I M k I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L L T 1 Q O B w z n 3 l h A l n 2 n j e l 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G b u d x 6 p 0 i y W 9 2 a a 0 E D g k W Q R I 9 h Y q d 0 n Y z a o D U p l r + o t g N a J n 5 M y 5 G g O S t / 9 Y U x S Q a U h H G v d 8 7 3 E B B l W h h F O Z 2 4 / 1 T T B Z I J H t G e p x I L q I F t c O 0 M V q w x R F C v 7 p E E L 9 X d H h o X W U x H a S o H N W K 9 6 c / E / r 5 e a 6 D r I m E x S Q y V Z L o p S j k y M 5 l 9 H Q 6 Y o M X x q C S a K 2 V s R G W O F i b E B u W 4 F k V S b W K B 8 3 t / F o Z i 5 N i d / N Z V 1 0 q 5 V / c t q / a 5 e b n h 5 Y k U 4 g 3 O 4 A B + u o A G 3 0 I Q W E H i A J 3 i B V + f Z I R q L u c = " > A A A C E H i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y I q M u C L l x W s A 9 o h 5 J J M 2 1 s M h m S O 0 I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L G z r g c D h n P v K C R P B D X j e t 1 N Y W 9 / Y 3 C p u u z u 7 e / s H p c O j p l G p p q x B l V C 6 H R L D B I 9 Z A z g I 1 k 4 0 I z I U r B W O b q Z + 6 4 l p w 1 X 8 A O O E B Z I M Y h 5 x S s B K z S 4 d 8 t 5 t r 1 T 2 q t 4 M e J X 4 O S m j H P V e 6 a f b V z S V L A Y q i D E d 3 0 s g y I g G T g W b u N 3 U s I T Q E R m w j q U x k c w E 2 e z a C a 5 Y p Y 8 j p e 2 L A c / U v x 0 Z k c a M Z W g r J Y G h W f a m 4 n 9 e J 4 X o O s h 4 n K T A Y j p f F K U C g 8 L T r + M + 1 4 y C G F t C q O b 2 V k y H R B M K N i D X r W C a G l A S 5 / M W F 4 d y 4 t q c / O V U V k n z v O p f V i / u L 8 o 1 L 0 + s i E 7 Q K T p D P r p C N X S H 6 q i B K H p E z + g V v T k v z r v z 4 X z O S w t O 3 n O M F u B 8 / Q L H 5 J x L < / l a t e x i t > D < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 L 2 f m n c y I S 7 i + T f o l x 8 c A I R q L u c = " > A A A C E H i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y I q M u C L l x W s A 9 o h 5 J J M 2 1 s M h m S O 0 I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L G z r g c D h n P v K C R P B D X j e t 1 N Y W 9 / Y 3 C p u u z u 7 e / s H p c O j p l G p p q x B l V C 6 H R L D B I 9 Z A z g I 1 k 4 0 I z I U r B W O b q Z + 6 4 l p w 1 X 8 A O O E B Z I M Y h 5 x S s B K z S 4 d 8 t 5 t r 1 T 2 q t 4 M e J X 4 O S m j H P V e 6 a f b V z S V L A Y q i D E d 3 0 s g y I g G T g W b u N 3 U s I T Q E R m w j q U x k c w E 2 e z a C a 5 Y p Y 8 j p e 2 L A c / U v x 0 Z k c a M Z W g r J Y G h W f a m 4 n 9 e J 4 X o O s h 4 n K T A Y j p f F K U C g 8 L T r + M + 1 4 y C G F t C q O b 2 V k y H R B M K N i D X r W C a G l A S 5 / M W F 4 d y 4 t q c / O V U V k n z v O p f V i / u L 8 o 1 L 0 + s i E 7 Q K T p D P r p C N X S H 6 q i B K H p E z + g V v T k v z r v z 4 X z O S w t O 3 n O M F u B 8 / Q L H 5 J x L < / l a t e x i t > D … … (A) (B) < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 C e 4 k t O d Y J 8 P P 3 S y U L 7 6 Z w J L e q k = "  > A A A C C 3 i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L h s w T 6 g H U o m z b S h y W R I M k I Z + g X i T r / E n b j 1 I / w Q 9 6 b t L G z r g c D h n P v K C R P O t P G 8 b 6 e w t b 2 z u 1 f c d w 8 O j 4 5 P S q d n b S 1 T R W i L S C 5 V N 8 S a c h b T l m G G 0 2 6 i K B Y h p 5 1 w c j / 3 O 0 9 U a S b j R z N N a C D w K G Y R I 9 h Y q R k N S m W v 6 i 2 A N o m f k z L k a A x K P / 2 h J K m g s S E c a 9 3 z v c Q E G V a G E U 5 n b j / V N M F k g k e 0 Z 2 m M B d V B t j h 0 h i p W G a J I K v t i g x b q 3 4 4 M C 6 2 n I r S V A p u x X v f m 4 n 9 e L z X R X Z C x O E k N j c l y U Z R y Z C S a / x o N m a L E 8 K k l m C h m b 0 V k j B U m x m b j u h V E U m 2 k Q P m 8 1 c W h m L k 2 J 3 8 9 l U 3 S v q 7 6 N 9 V a s 1 a u e 3 l i R b i A S 7 g C H 2 6 h D g / Q g B Y Q o P A M x i H o q f A W M w v C x w W X a / q 1 / 2 p 8 D w E J V R R q V b X + w p 7 i m a S J U A F M a Y T + C l E O d H A q W C F G 2 a G p Y S O y I B 1 L C Z E M h P l 0 / s L X L N O D / e V t i 8 B P H V / T + R E G j O W s e 2 U B I Z m t j Y x / 6 t 1 M u i f R T l P 0 g x Y Q n 8 W 9 T O B Q e F J G L j H N a M g x h Y I 1 d z e i u m Q a E L B R P Z 2 5 + Y X F p u b L i r q 6 t b 2 x 6 W 9 t t o z J N W Y s q o X Q n J o Y J n r A W c B C s k 2 p G Z C z Y d X x 3 N q p f 3 z N t u E q u Y J i y S J K b h A 8 4 J W C t n r c b 5 v i h d 4 R D 0 V d g L O Y X B Q 6 L n l f 1 a / 5 Y e B a C E q q o V L P n f Y d 9 R T P J E q C C G N M N / B S i n G j g V L D C D T P D U k L v y A 3 r W k y I Z C b K x / c X + M A 6 f T x Q 2 r 4 E 8 N j 9 O 5 E T a c x Q x r Z T E r g 1 0 7 W R + V + t m 8 H g N M p 5 k m b A E v q 7 a J A J D A q P w s B 9 r h k F M b R A q O b 2 V k x v i S Y U b G S u e 4 B p Z k B J X P 4 3 u T i W h W t z C q Z T m Y X 2 U S 0 4 r t U v 6 9 W G X y Z W Q X t o H x 2 i A J 2 g B j p H T d R C F D 2 i J / S C v e J t b V 0 b l m r I O V U L p m 5 g Y J n j K O s B B s J t M M y J j w a 7 j 0 c n E v 7 5 l 2 n C V X s I 4 Y 5 E k g 5 T 3 O S V g p Z 6 H w w K H C R N A e g E O R a L A f N + L s x K H Z c 9 r + E 1 / C v y X B B V p o A r n P e 8 j T B T N J U u B C m J M N / A z i A q i g V P B S j f M D c s I H Z E B 6 1 q a E s l M V E x / U u I 9 q y S 4 r 7 Q 9 K e C p + r O j I N K Y s Y x t p S Q w N L + 9 i f i f 1 8 2 h f x w V P M 1 y Y C n 9 W t T P B Q a F J 7 H g h G t G Q Y w t I V R z + 1 Z M h 0 Q T C j Y Y z Q r J C C l E j K 3 J d W u Q p N p I A Y v / F g 9 j M X V t T / 5 y K 6 u k f V 7 3 L + s X d x f V h l c 0 V g Y n 4 B S c A R 9 c g Q a 4 B U 3 Q A g Q k 4 B m 8 g j f n x X l 3 P p z P + W j J K X a O w Q K c r 1 / E M 5 7 4 < / l a t e x i t > S 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " n 0 The derivative f ′ can be computed analytically according to Eq.1. See Figure 2A for an illustration of the model. b V u p T T o r C E S q M 1 Z t r S + c N 7 Z x E = " > A A A C F n i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y I q M u C G 5 c V 7 Q M 6 Y 0 n S Y z Q r J C C l E j K 3 J d W u Q p N p I A Y v / F g 9 j M X V t T / 5 y K 6 u k f V 7 3 L + s X d x f V h l c 0 V g Y n 4 B S c A R 9 c g Q a 4 B U 3 Q A g Q k 4 B m 8 g j f n x X l 3 P p z P + W j J K X a O w Q K c M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x p 1 / i T t Q 6 Y o M X x q C S a K 2 V s R G W O F i b E B u W 4 F k V S b W K B 8 3 t / F o Z i 5 N i d / N Z V 1 0 q 5 V / c t q / a M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x v u Y O R v o t 3 t E s = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z J T i U x S Q a U h H G v d 8 7 3 E B B l W h h F O Z 2 4 / 1 T T B Z I J H t G e p x I L q I F t c O 0 M V q w x R F C v 7 p E E L 9 X d H h o X W U x H a S o H N W K 9 6 c / E / r 5 e a 6 D r I m E x S Q y V Z L o p S j k y M 5 l 9 H Q 6 Y o M X x q C S a K 2 V s R G W O F i b E B u W 4 F k V S b W K B 8 3 t / F o Z i 5 N i d / N Z V 1 0 q 5 V / c t q / a 5 e b n h 5 Y k U 4 g 3 O 4 A B + u o A G 3 0 I Q W E H i A J 3 i B V + f Z M = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z I j R V 0 W 3 L i s Y B / Q D i W T Z t r Y J D M k G a E M / Q d x p 1 / i T t In practice, we stack multiple bijectives to enable more complex transformation. The derivative of the composite can be computed efficiently using the chain rule. At inference time, we can efficiently compute the inverse mapping f et al., 2016) . Each coupling layer updates one dimension using the bijective for S 1 , keeping the other D -1 dimensions fixed, and uses the D -1 dimensions along with the hidden representation of the residue as the conditioner to parameterize the bijective (Figure 2B ): g d (x|h i )[j] = f (x j |x \j , h i ), j = d mod D x j , j ̸ = d mod D , ( ) where d is the dimension to update. The coupling layer g d preserves invertibility and has closedform inversion. The determinant of its Jacobian is equal to the derivative of f (x j |x \j , h i ). We stack multiple spline-based coupling layers to ensure that all the dimensions are updated at least once, resulting in a stack of L coupling layers: g = g 1 • g 2 • • • • • g L (L ≥ D). We choose the uniform distribution as the base distribution, denoted by p z (z) = ( 1 2π ) D and let g map the target distribution to the base distribution. The log-likelihood is computed using the change-of-variable rule: log p(x|h i ) = log p z (g(x)) + log |det ∇ x g(x)| = -D log 2π + L l=1 log f ′ l (x d(l) |x \d(l) , h i ) . (

5)

Training Objective To train RDE, we minimize the negative log-likelihood of native (ground truth) rotamers: min E (χi,S)∼p data [-log p (χ i |h i (S))] , where S = {a j , p j , O j , χj } N j=1 is a protein structure sampled from the dataset. Since the model is designed for mutation analysis, the prior rotamers should emulate the rotamers after some residues are mutated. Thus, we mask the rotamers of a random portion of residues and add noise to the rotamers of the neighbors of the masked ones. The negative log-likelihood training objective function is evaluated on the masked residues and a random part of the perturbed residues.

3.3. ROTAMER ENTROPY ESTIMATION

RDE models the distribution of possible rotamers of residue, making the entropy of the rotamer distribution a natural measure of conformational flexibility (Brady & Sharp, 1997 ): H(S, i) = E χ∼p [-log p(χ|h i (S))] . (7) To estimate the entropy, we use a stochastic method: First, we sample a set of rotamers from the distribution using the inverted flows (Eq.1, 4). Then, we compute the negative log probability of the samples and take their average as an estimate of the entropy. Computing these steps is efficient thanks to the ability to compute the exact likelihood of normalizing flows. For residues without rotatable sidechains (alanine, glycine, and proline), we consider their conformational entropy to be constant. Nonetheless, we can still estimate the entropy of neighboring residues to evaluate their impact on conformational flexibility.

3.4. MUTATIONAL EFFECT (∆∆G) PREDICTION

Let us consider a protein-protein complex W LR = {a i , p i , O i , χi } consisting of N residues, where M of them belong to the first group L = {1, . . . , M }, and the remaining N -M belong to the second group R = {M + 1, . . . , N }. We refer to group L as the ligand group since it contains mutations, and group R as the receptor group. In the unbound state, we represent the two separated structures by W L and W R respectively. If we mutate m residues in group L, numbered by {1, . . . , m} ⊂ L, the mutated structure in the bound state is denoted by M LR = {ã i , p i , O i , ∅} m i=1 ∪ {a i , p i , O i , χi } unmutated , where ãi represents the mutated residue type and ∅ indicates that the rotamers of the mutated residues are unknown. The two groups in the unbound state are denoted by M L and M R respectively. Note that M R = W R if no mutations are in the receptor group.

Linear Predictor

The entropy loss upon binding of a protein complex S (S = W, M) is defined as the difference of entropy between the bound and the unbound states (Kastritis & Bonvin, 2013) . We approximate the entropy loss ∆H S using a linear model and our estimated entropy: ∆H S = w bound SL i∈L (H(S LR , i) + E(a i )) H bound SL bound ligand +w bound SR i∈R (H(S LR , i) + E(a i )) H bound SR bound receptor - w unbnd SL i∈L (H(S L , i) + E(a i )) H unbnd SL unbound ligand +w unbnd SR i∈R (H(S R , i) + E(a i )) H unbnd SR unbound receptor , (S = W, M). ( ) where w SJ > 0 (S = W, M; J = L, R) is the coefficient that controls the contribution of entropy to the binding free energy, and E(•) is the entropy bias for 20 different amino acid types. The thermodynamic definition states that the core component in binding ∆∆G is the difference in entropy loss between the wild-type and mutant structure. Therefore, we estimate ∆∆G by: ∆∆G pred = ∆H M -∆H W + b = (w bound ML H bound ML + w bound MR H bound MR ) -(w unbnd ML H unbnd ML + w unbnd MR H unbnd MR )- (w bound WL H bound WL + w bound WR H bound WR ) + (w unbnd WL H unbnd WL + w unbnd WR H unbnd WR ) + b. Note that as we assume there are no mutations in the receptor group (M R = W R ), H unbnd MR and H unbnd WR cancel out each other and do not contribute to ∆∆G. To calibrate the parameters in Eq.9, we use the block coordinate descent method, which alternates between optimizing the coefficients (w SJ , b) and the entropy biases (E) using the mean squared error (MSE) loss. Neural Network Predictor Each residue's hidden representation h i used to parameterize the normalizing flows contains sufficient information about the rotamer distribution. To extract binding information from these representations in a more flexible way, we employed neural networks. Specifically, we utilized a network that shares the same architecture as the encoder to transform the representation h i and applied max-pooling to obtain a global structure representation. We then subtracted the representation of the wild-type structure from the mutant representation and fed it into another MLP to predict ∆∆G. To enforce anti-symmetry, we swapped the wild-type and mutant to predict -∆∆G, and computed (∆∆G -(-∆∆G)) /2 as the final prediction. The network was trained using the MSE loss. During training, we freeze the weights of RDE and do not back-propagate gradients through h i to fully exploit the unsupervised representations learned by RDE.

3.5. MODEL TRAINING

The dataset for training RDE is derived from PDB-REDO (Joosten et al., 2014) , which is a database containing refined X-ray structures in PDB. The protein chains are clustered based on 50% sequence identity, leading to 38,413 chain clusters, which are randomly divided into the training, validation, and test sets by 95%/0.5%/4.5% respectively. During training, the data loader randomly selects a cluster and then randomly chooses a chain from the cluster to ensure balanced sampling. We crop structures into patches containing 128 residues by first choosing a seed residue, and then selecting its 127 nearest neighbors based on C-beta distances. To simulate mutations, we masked the rotamers of 10% of residues in the patch, and we added noise to the rotamers of residues whose C-beta distance to the closest masked residue was less than 8 Å. The SKEMPI2 database (Jankauskaitė et al., 2019) is used to train the models for ∆∆G prediction described in Section 3.4. We split the dataset into 3 folds by structure, each containing unique protein complexes that do not appear in other folds. Two folds are used for training and validation, and the remaining fold is used for testing. This approach yields 3 different sets of parameters and ensures that every data point in SKEMPI2 is tested once.

4.1. PREDICTION OF THE EFFECT OF MUTATIONS ON BINDING

Baselines We evaluate the performance of our two ∆∆G predictors, RDE-Linear and RDE-Network, against several categories of baseline methods. The first category comprises traditional empirical energy functions, including Rosetta Cartesian ddG (Park et al., 2016; Alford et al., 2017; Leman et al., 2020) and FoldX (Delgado et al., 2019) . The second category consists of sequence/evolution-based methods, represented by ESM-1v (Meier et al., 2021) , PSSM (positionspecific scoring matrix), MSA Transformer (Rao et al., 2021), and Tranception (Notin et al., 2022) . The third category includes end-to-end learning models, such as DDGPred (Shan et al., 2022) , and a model that shares the same encoder architecture as RDE but uses an MLP to directly predict ∆∆G (End-to-End). The fourth category consists of unsupervised/semi-supervised learning methods, including ESM-IF (Hsu et al., 2022) and Masked Inverse Folding (MIF) (Yang et al., 2022) . Similar to our RDE-Network, this class of methods pre-train a network on structures and use the pretrained representations to predict ∆∆Gs. The baseline MIF network also uses the same encoder architecture as RDE for comparison. There are two variants for ∆∆G prediction: MIF-∆logit, which uses the difference in log-probability of amino acid types to predict ∆∆G, and MIF-Network, which predicts ∆∆G from the learned representations using the same network architecture as RDE-Network. Finally, given that our method is based on conformational flexibility, we train a network to predict the B-factor of residues and use predicted B-factors in place of entropy to predict ∆∆G. Metrics We use five metrics to evaluate the accuracy of ∆∆G prediction: Pearson correlation coefficient, Spearman's rank correlation coefficient, minimized RMSE (root mean squared error), minimized MAE (mean absolute error), and AUROC (area under the receiver operating characteristic). To calculate AUROC, mutations are classified based on the sign of ∆∆G. In practical applications, the correlation for one specific protein complex is often of greater interest. Therefore, we group mutations by structure, discard groups with less than 10 mutation data points and calculate correlations for each structure separately. This leads to two additional metrics: average per-structure Pearson correlation coefficient and average per-structure Spearman correlation coefficient.

Results

According to Table 1 , our RDE-Network outperforms all the baselines. Notably, it demonstrates a significant improvement in per-structure correlations, indicating its greater reliability for practical applications. The superior performance of RDE-Network over MIF-Network suggests that representations derived from fitting rotamer densities are more effective than those from masked in- verse folding, as RDE captures atomic interactions well by modeling the conformation of sidechain atoms. RDE-Linear achieves comparable performance to Rosetta and outperforms some unsupervised learning baselines. While it does not surpass most baseline methods over the entire SKEMPI2 dataset, we observe that its performance is better when considering only single-point mutations (Table 6 in the appendix). This might be attributed to the fact that simple linear models cannot capture well the non-linear relationship dominating multi-point mutations. Nevertheless, RDE-Linear demonstrates that using the basic statistics of the estimated rotamer density alone can predict ∆∆G, which lays the foundation for the more accurate RDE-Network. Sequence-based models do not accurately predict ∆∆G for protein-protein binding, as discussed in Section 2.2. Figure 3 shows the distribution of per-complex correlation coefficients. Please refer to Section C of the appendix for more results and discussion.

4.2. OPTIMIZATION OF HUMAN ANTIBODIES AGAINST SARS-COV-2

In Shan et al. (2022) , the authors report five single-point mutations on a human antibody against SARS-CoV-2 that enhance neutralization effectiveness. These mutations are among the 494 possible single-point mutations on the heavy chain CDR region of the antibody. We use the most competitive methods benchmarked in Section 4.1 to predict ∆∆Gs for all the single-point mutations and rank them in ascending order (lowest ∆∆G in the top). The effectiveness of a predictor is determined by the number of favorable mutations ranked in the top place. As shown in Table 2 , RDE-Network and DDGPred successfully identify three mutations (Ranking ≤ 10%), with RDE-Network ranking them higher. , as the receptor is not mutated. We perform linear regression on the SKEMPI2 single-mutation dataset and present the regression coefficients, bias, and P-values in Table 3 . According to the statistics, all entropy terms, except the entropy of unbound receptor H unbnd R (coefficient w unbnd R ), show a significant relationship with experimental ∆∆Gs. The coefficients of the significant terms are all positive and roughly similar. The entropy of the unbound receptor H unbnd R has no contribution because the receptor alone does not involve in the mutation. These results agree with the thermodynamic definition of the change in binding free energy: ∆∆G = ∆G M -∆G W = (G M LR -G M L -G M R ) -(G W LR -G W L -G W R ), where G M R and G W R cancel each other out as the receptor is unmutated, therefore indicating that our model captures well the thermodynamics underlying protein-protein interactions. For a detailed discussion about the thermodynamic background, please refer to Section B in the appendix. Correlation Between Estimated Entropy and B-factors B-factor is an experimental measurement that quantifies the conformational flexibility. We calculate the average b-factor of sidechain atoms of residues in the test split of the PDB-REDO dataset. Then, we estimate the conformational entropy for each residue in the test split using RDE. The average Pearson correlation coefficient between these two quantities is 0.4637, and the average Spearman coefficient is 0.4282. Detailed results are presented in Table 8 in the appendix. In summary, this indicates that there is a correlation between the entropy estimated by RDE and experimentally determined conformational flexibility measured by B-factor. DE is a generative model for protein sidechain structures, which can predict sidechain conformations by sampling from the estimated distribution. We use RDE to sample sidechain torsional angles (rotamers) for structures with 10% sidechains removed in our test split of PDB-REDO. For each residue, 10 rotamers are sampled independently, and the one with the highest probability is selected as the final prediction. We compare RDE with two baseline methods Rosetta (fixbb) (Leman et al., 2020) and SCWRL4 (Krivov et al., 2009) . Our results shown in Table 4 demonstrate that RDE outperforms the baselines on all four torsional angles in terms of angular errors. For a detailed per-amino-acid accuracy, please refer to Table 9 in the appendix.

5. CONCLUSIONS

In this work, we introduce Rotamer Density Estimator (RDE) which estimates the distribution of rotamers for protein sidechains. We demonstrate that RDE leads to improved accuracy in predicting binding ∆∆G compared to other methods. One limitation of RDE is the inability to model backbone flexibility directly which is an important future direction for extending the proposed model. Nonetheless, our work highlights the potential of using machine learning techniques to improve mutational effect prediction for protein-protein interaction. Ning Zhang, Yuting Chen, Haoyu Lu, Feiyang Zhao, Roberto Vera Alvarez, Alexander Goncearenco, Anna R Panchenko, and Minghui Li. Mutabind2: predicting the impacts of single and multiple mutations on protein-protein interactions. Iscience, 23(3):100939, 2020. Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125, 2022. ESM-1v (Meier et al., 2021) We use the implementation provided in the ESM open-source repository. Protein language models can only predict the effect of mutations for single protein sequences. Therefore, we ignore cases where mutations are on multiple sequences. We extract the sequence of the mutated protein chain from the SEQRES entry of the PDB file. We use masked-marginals mode to score both wild-type and mutant sequences and use their difference as the estimation of ∆∆G. PSSM We construct MSAs from the Uniref90 database (Suzek et al., 2007) for chains with mutation annotations in the SKEMPI dataset. We use Jackhmmer (Johnson et al., 2010) version 3.3.1 following the setting in Meier et al. (2021) . The MSAs are filtered using HHfilter (Steinegger et al., 2019) with coverage 75 and sequence identity 90. This HHfilter parameter is reported to have the best performance for MSA Transformer according to Meier et al. (2021) . We calculate position-specific scoring matrices (PSSM) and use the change in probability as the prediction of ∆∆G. MSA Transformer (Rao et al., 2021) We use the implementation provided in the ESM open-source repository. We input the MSAs constructed during the evaluation of PSSM to MSA Transformer. We used masked-marginals mode to score both wild-type and mutant sequences and use their difference as the prediction of ∆∆G. Tranception (Notin et al., 2022) We use the implementation provided in the Tranception opensource repository. We predict mutation effects using the large model checkpoint. Previously built MSAs (not filtered by HHfilter) are used for inference-time retrieval. DDGPred (Shan et al., 2022) We use the implementation accompanying the paper by Shan et al. (2022) . Since this model requires predicted sidechain structures of the mutant, we use mutant structures packed during our evaluation of Rosetta to train the model and run prediction. End-to-End The end-to-end model shares the same encoder architecture as the rotamer density estimator. The difference is that in the RDE normalizing flows follow the encoder to model rotamer distributions, but in the end-to-end model, the embeddings are directly fed to an MLP to predict ∆∆G. B-factor This model predicts per-atom b-factors for proteins. It has the same encoder architecture as the RDE. Following the encoder is an MLP that predicts a vector for each amino acid where each dimension is the predicted b-factor of different atoms in the amino acid. The amino acid-level b-factor is calculated by averaging the atom-level b-factors. The predicted b-factors are used as a measurement of conformational flexibility. They are used to predict ∆∆G using the linear model same as RDE-Linear defined in Eq.9. ESM-IF (Hsu et al., 2022) ESM-IF can score protein sequences using the log-likelihood. The scoring function implementation is provided in the ESM repository. We enable the --multichain backbone flag to let the model see the whole protein-protein complex. We subtract the log-likelihood of the wild-type from the mutant to predict ∆∆G.

MIF Architecture

The masked inverse folding (MIF) network uses the same encoder architecture as RDE. Following the encoder is a per-amino-acid 20-category classifier that predicts the type of masked amino acids. We use the same PDB-REDO train-test split to train the model. At training time, we randomly crop a patch consisting of 128 residues, and randomly mask 10% amino acids. The model learns to recover the type of masked amino acids with the standard cross entropy loss.

MIF-∆logit

To score mutations, we first mask the type of mutated amino acids. Then, we use the log probability of the amino acid type as the score. Analogously, we have the score of the wildtype bound ligand, wild-type bound receptor, wild-type unbound ligand, unbound receptor, mutated bound ligand, mutated bound receptor, and mutated unbound ligand. Therefore, we use the linear model identical to RDE-Linear (Eq.9) to predict ∆∆G from the scores. MIF-Network This is similar to RDE-Network. The difference is that we use the pre-trained encoder of MIF rather than the encoder of RDE. We also freeze the MIF encoder as we aim to utilize the unsupervised representations.

A.4 SOURCE CODE

Available at https://github.com/luost26/RDE-PPI.

B BACKGROUND

This section introduces the thermodynamic principle underlying the design of RDE, which connects entropy and binding affinity. The Gibbs free energy of association is the physical quantity used to measure the binding affinity between two groups of protein: ∆G a = ∆H -T ∆S. In this equation, ∆H is the change in enthalpy upon the formation of the complex, which is generally assumed to be negligible as no covalent bonds are formed or broken upon protein-protein binding. T is the temperature parameter, and ∆S is the change in entropy upon binding (Kastritis & Bonvin, 2013) . By ignoring ∆H and expanding ∆S, we can rewrite ∆G a as ∆G a = T (S d -S a ), where S a is the entropy of the proteins in the bound state (complex), and S d is the entropy in the unbound state (separated). To predict binding affinity, we need to calculate the entropy of the separated proteins (S d ) and the entropy of the protein complex (S a ). The entropy S is defined by the Boltzmann expression: S = -k B p(x) log p(x)dx = -k B E x∼p log p(x), where p(x) is the distribution of conformation x and k B is the Boltzmann constant (Brady & Sharp, 1997) . To calculate the entropy of the protein complex S a , we need to evaluate the integral with respect to the distribution of the complex conformation denoted by p a . To calculate the entropy of the separated proteins S d , we can factorize the probability density p d = p ligand (x ligand )•p receptor (x receptor ) and then evaluate -k B E x∼pligand log p ligand (x) + E x∼preceptor log p receptor (x) . Substituting S d and S a in ∆G a with these expressions, we can obtain the formula: ∆G a = -k B T E x∼pligand log p ligand (x) + E x∼preceptor log p receptor (x) -E x∼pa log p a (x) , which indicates that we can predict binding affinity by estimating the entropy of the conformation distributions of the protein complex p a and each separated protein p ligand , p receptor . We assume that sidechain conformation changes are the major determinant of protein-protein binding, so we can keep the protein backbone fixed and model only the distribution of sidechain conformations (rotamers) (Najmanovich et al., 2000; Cole & Warwicker, 2002) . This assumption leads to the core component of this work, Rotamer Density Estimator (RDE), which approximates p a , p ligand , and p receptor , enabling us to estimate ∆G a by evaluating the entropy of these distributions. Finally, to evaluate the effect of mutations, we apply RDE to estimate G a for both the wild-type and the mutant. We then calculate the difference between the G a values of the mutant and wild-type, yielding the quantity ∆∆G: ∆∆G = ∆G mutant -∆G wild-type . We refer the reader to Brady & Sharp (1997) and Kastritis & Bonvin (2013) for a comprehensive treatment of the relationship between binding affinity and entropy. 



Pearson: 0.4158 Spearman: 0.3514 Pearson: 0.6540 Spearman: 0.5482



Figure 1: The conformational flexibility of the interface generally decreases upon binding.

t e x i t s h a 1 _ b a s e 6 4 = " 8 p U 4 A 4 N h 9 R X + r j / k M C N L k P x n T a 0

e X P e n Y 9 l a c H J e 0 7 h D 5 z P H 6 o m n D k = < / l a t e x i t > 2 < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 C e 4 k t O d Y J 8 P P 3 S y U L 7 6 Z w J L e q

e X P e n Y 9 l a c H J e 0 7 h D 5 z P H 6 o m n D k = < / l a t e x i t > 2 Coupling Layer < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 L 2 f m n c y I S 7 i + T f o l x 8 c A

r / D m v D j v z o f z u S w t O H n P O a z A + f o F D w S a T A = = < / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 k C 0 5 4 z 9 c Z Y + S D G h j C X 5 J J K n N M 0 = " > A A A C I n i c b Z D N S g M x F I U z / t b x b 1 R w 4 y Z Y C q 7 K j B R 1 W X A j u K l g V e g M J Z O m b W g y G Z I 7 Q h n n Z c S d P o k 7 c S X 4 G u 5 N 6 y y 0 9 U D g 4 9 x 7 c y 8 n T g U 3 4 P s f z s L i 0 v L K a m X N X d / Y 3 N r 2 d n Z v j M o 0 Z W 2 q h N J 3 M T F M 8 I S 1 g Y N g d 6 l m R M a C 3 c a j 8 0 n 9 9 p 5 p w 1 V y D e O U R Z I M E t 7 n l I C 1 u t 5 + m O N x 9

u a 6 N U w z A 0 r i 8 r + / i 2 N Z u D a n Y D a V e b g 5 r g c n 9 c Z V o 9 r 0 y 8 Q q 6 A A d o i M U o F P U R B e o h d q I o g f 0 i J 7 R i / P k v D p v z v t P 6 4 J T z u y h P 3 I + v w E z w a L A < / l a t e x i t > {y2 . . . yK} < l a t e x i t s h a 1 _ b a s e 6 4 = " R s J P 3 A o N z w + F j 5 n F 8 Y 7 y c O B b 2 B w = " > A A A C I n i c b Z D N S g M x F I U z / t b x b 1 R w 4 y Z Y B F d l R o q 6 L L g R 3 F S w t d A Z S i Z N N Z h M h u S O W M Z 5 G X G n T + J O X A m + h n v T O g v b e i D w c e 6 9 u Z c T p 4 I b 8 P 1

X p 1 n 5 8 1 5 d z 5 + W + e c c m Y H T c j 5 + g E w W q K + < / l a t e x i t > {x2 . . . xK} < l a t e x i t s h a 1 _ b a s e 6 4 = " D w 0 z 8 W o M l 9 / p L W w B L s X s g L R n S M 8 = " > A A A C L H i c b V B N S 8 M w G E 7 n 1 6 x f V Y 9 e g k P w N F o Z 6 n H g R f C i 4 F R Y y 0 j T b A t L m p K 8 F U b p 1 V 8 j 3 v S X e B H x 6 j / w b j Y r O P W B w J P n e T + S J 8 4 E N + D 7 L 0 5 t b n 5 h c a m + 7 K 6 s r q 1

8 1 9 3 D N D e g J K 7 m z S 6 O Z e n a n I L f q f w l V w f N 4 L D Z u m g 1 2 n 6 V W B 3 t o F 2 0 j w J 0 h N r o F J 2 j D q L o D t 2 j R / T k P D j P z q v z 9 l V a c 6 q e b T Q D 5 / 0 T Z 0 O m / Q = = < / l a t e x i t >{ 1 . . . t e x i t s h a 1 _ b a s e 6 4 = " n 0 b V u p T T o r C E S q M 1 Z t r S + c N 7 Z x E = " > A A A C F n i c b V D L S g M x F M 3 U V x 1 f V Z d u g q X g q s y I q M u C G 5 c V 7 Q M 6 Y 0 n ST B u a T I Y k I 5 S h v y H u 9 E v c i V u 3 f o h 7 M + 0 s b O u B w O G c e 3 M P B y e c a e N 5 3 0 5 p b X 1 j c 6 u 8 7 e 7 s 7 u 0 f V A 6 P 2 l q m i t A W k V y q L k a a c h b T l m G G 0 2 6 i K B K Y 0 w 4 e 3 + R + 5 4 k q z W T 8 Y C Y J D Q U a x i x i B B k r B Y F A Z o R x d j 9 9 9 P u V q l f 3 Z o C r x C 9 I F R R o 9 i s / w U C S V N D Y E I 6 0 7 v l e Y s I M K c M I p 1 M 3 S D V N E B m j I e 1 Z G i N B d Z j N M k 9 h z S o D G E l l X 2 z g T P 2 7 k S G h 9 U R g O 5 l n 1 M t e L v 7 n 9 V I T X Y c Z i 5 P U 0 J j M D 0 U p h 0 b C v A A 4 Y I o S w y e W I K K

T B u a T I Y k I 5 S h v y H u 9 E v c i V u 3 f o h 7 M + 0 s b O u B w O G c e 3 M P B y e c a e N 5 3 0 5 p b X 1 j c 6 u 8 7 e 7 s 7 u 0 f V A 6 P 2 l q m i t A W k V y q L k a a c h b T l m G G 0 2 6 i K B K Y 0 w 4 e 3 + R + 5 4 k q z W T 8 Y C Y J D Q U a x i x i B B k r B Y F A Z o R x d j 9 9 9 P u V q l f 3 Z o C r x C 9 I F R R o 9 i s / w U C S V N D Y E I 6 0 7 v l e Y s I M K c M I p 1 M 3 S D V N E B m j I e 1 Z G i N B d Z j N M k 9 h z S o D G E l l X 2 z g T P 2 7 k S G h 9 U R g O 5 l n 1 M t e L v 7 n 9 V I T X Y c Z i 5 P U 0 J j M D 0 U p h 0 b C v A A 4 Y I o S w y e W I K K

e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F

p 1 / i T t z 6 B 3 6 I e 9 N 2 F r b 1 Q O B w z n 3 l h A l n 2 n j e t 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T l o 5 T R W i T x D x W n R B r y p m k T c M M p 5 1 E U S x C T t v h + H b m t 5 + o 0 i y W D 2 a S 0 E D g o W Q R I 9 h Y q d U j I 9 b 3 + 6 W y V / X m Q O v E z 0 k Z c j T 6 p Z / e I C a p o N I Q j r X u + l 5 i g g w r w w i n U 7 e X a p p g M s Z D 2 r V U Y k F 1 k M 2 v n a K K V Q Y o i p V 9 0 q C 5 + r c j w 0 L r i Q h t p c B m p F e 9 m f i f 1 0 1 N d B N k T C a p o Z I s F k U p R y Z G s 6 + j A V O U G D 6 x B B P F 7 K 2 I j L D C x N i A X L e C S K p N L F A + b 3 l x K K a u z c l f T W W d t C 6 r / l W 1 d l 8 r 1 7 0 8 s S K c w T l c g A / X U I c 7 a E A T C D z C M 7 z C m / P i v D s f z u e i t O D k P a e w B O f r F 6 h / n D g = < / l a t e x i t > 1 = 179.4°< l a t e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F

z 6 B 3 6 I e 9 N 2 F r b 1 Q O B w z n 3 l h A l n 2 n j e t 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T l o 5 T R W i T x D x W n R B r y p m k T c M M p 5 1E U S x C T t v h + H b m t 5 + o 0 i y W D 2 a S 0 E D g o W Q R I 9 h Y q d U j I 9 b 3 + 6 W y V / X m Q O v E z 0 k Z c j T 6 p Z / e I C a p o N I Q j r X u + l 5 i g g w r w w i n U 7 e X a p p g M s Z D 2 r V U Y k F 1 k M 2 v n a K K V Q Y o i p V 9 0 q C 5 + r c j w 0 L r i Q h t p c B m p F e 9 m f i f 1 0 1 N d B N k T C a p o Z I s F k U p R y Z G s 6 + j A V O U G D 6 x B B P F 7 K 2 I j L D C x N i A X L e C S K p N L F A + b 3 l x K K a u z c l f T W W d t C 6r / l W 1 d l 8 r 1 7 0 8 s S K c w T l c g A / X U I c 7 a E A T C D z C M 7 z C m / P i v D s f z u e i t O D k P a e w B O f r F 6 h / n D g = < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " o I Z U 3 V w 5 I C W k M w bv u Y O R v o t 3 t E s = " > A A A C E H i c b V D L S g M x F L 1 T X 3 V 8 V V 2 6 C Z a C q z J T ir o s u H F Z w T 6 g H U o m z b S x S W Z I M k I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L L T 1 Q O B w z n 3 l h A l n 2 n j e l 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G b u d x 6 p 0 i y W 9 2 a a 0 E D g k W Q R I 9 h Y q d 0 n Y z a o D U p l r + o t g N a J n 5 M y 5 G g O S t / 9 Y U x S Q a U h H G v d 8 7 3 E B B l W h h F O Z 2 4 / 1 T T B Z I J H t G e p x I L q I F t c O 0 M V q w x R F C v 7 p E E L 9 X d H h o X W U x H a S o H N W K 9 6 c / E / r 5 e a 6 D r I m E x S Q y V Z L o p S j k y M 5 l 9 H

5 e b n h 5 Y k U 4 g 3 O 4 A B + u o A G 3 0 I Q W E H i A J 3 i B V + f Z e X P e n Y 9 l a c H J e 0 7 h D 5 z P H 6 o m n D k = < / l a t e x i t > 2 = 236.1°= 185.5°< l a t e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F

p 1 / i T t z 6 B 3 6 I e 9 N 2 F r b 1 Q O B w z n 3 l h A l n 2 n j e t 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T l o 5 T R W i T x D x W n R B r y p m k T c M M p 5 1 E U S x C T t v h + H b m t 5 + o 0 i y W D 2 a S 0 E D g o W Q R I 9 h Y q d U j I 9 b 3 + 6 W y V / X m Q O v E z 0 k Z c j T 6 p Z / e I C a p o N I Q j r X u + l 5 i g g w r w w i n U 7 e X a p p g M s Z D 2 r V U Y k F 1 k M 2 v n a K K V Q Y o i p V 9 0 q C 5 + r c j w 0 L r i Q h t p c B m p F e 9 m f i f 1 0 1 N d B N k T C a p o Z I s F k U p R y Z G s 6 + j A V O U G D 6 x B B P F 7 K 2 I j L D C x N i A X L e C S K p N L F A + b 3 l x K K a u z c l f T W W d t C 6 r / l W 1 d l 8 r 1 7 0 8 s S K c w T l c g A / X U I c 7 a E A T C D z C M 7 z C m / P i v D s f z u e i t O D k P a e w B O f r F 6 h / n D g = < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " o I Z U 3 V w 5 I C W k M w b

r o s u H F Z w T 6 g H U o m z b S x S W Z I M k I Z + g / i T r / E n b j 1 D / w Q 9 6 b t L L T 1 Q O B w z n 3 l h A l n 2 n j e l 1 P Y 2 N z a 3 i n u u n v 7 B 4 d H p e O T t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G b u d x 6 p 0 i y W 9 2 a a 0 E D g k W Q R I 9 h Y q d 0 n Y z a o D U p l r + o t g N a J n 5 M y 5 G g O S t / 9 Y

e X P e n Y 9 l a c H J e 0 7 h D 5 z P H 6 o m n D k = < / l a t e x i t > 2 = 54.6°= 185.5°< l a t e x i t s h a 1 _ b a s e 6 4 = " 8 T p + F G o 4 S z s m x R u d T 8 j y H t 8 S h F

Figure 2: (A) The overall architecture of Rotamer Density Estimator (RDE) for estimating distributions of rotamers with one torsional angle. (B) Invertible coupling layers alternating between different dimensions enable modeling distributions of rotamers with multiple torsional angles.

-1 (y)(Rezende et al., 2020): To find the solution of f -1 (y), the first step is to locate the unique bin that contains y. Assuming y belongs to the k-th bin, finding its corresponding x amounts to finding the root of the quadratic equation f k (x|x k,k+1 , y k,k+1 , δ k,k+1 ) = y in the interval [x k , x k+1 ], for which a closed-form solution exists. Conditional Flow on T D Rotamers with D torsional angles can be viewed as points on the Ddimensional torus which is the product of D circles S 1 , i.e. T D = S 1 × • • • × S 1 . To model the distribution on T D , we adopt the coupling layer technique to model the joint distribution (Dinh

Figure 3: Left: The distribution of per-structure Spearman correlation coefficients. Middle: Correlation between experimental ∆∆Gs and ∆∆Gs predicted by RDE-Linear. Right: Correlation between experimental ∆∆Gs and ∆∆Gs predicted by RDE-Network.

Figure 5: Correlation between experimental ∆∆Gs and ∆∆Gs predicted by RDE-Linear and RDE-Network on SKEMPI2 singlemutation subset.

Figure 6: Correlation between experimental ∆∆Gs and ∆∆Gs predicted by RDE-Linear and RDE-Network on SKEMPI2 multimutation subset.

Evaluation of ∆∆G prediction on the SKEMPI2 dataset. RDE-Network outperforms baseline methods. Most notably, RDE-Network significantly improves per-structure correlations, which are more relevant to practical applications.

Rankings of the five favorable mutations on the human antibody against SARS-CoV-2 by various competitive methods. RDE-Network ranks 3 of the 5 mutations in the top place (<10%).

Linear regression shows that the relevant terms estimated by RDE correlate significantly to ∆∆G.

Mean absolute error of the predicted sidechain torsional angles.

Evaluation of ∆∆G predictors on the single-mutation subset of SKEMPI2.

Evaluation of ∆∆G predictors on the multi-mutation subset of SKEMPI2.

Mean absolute error of the predicted sidechain torsional angles.

ACKNOWLEDGMENTS

Supported by National Key R&D Program of China No. 2021YFF1201600.

A IMPLEMENTATION DETAILS

A.1 NETWORK ARCHITECTURE The encoder of the rotamer density estimator is a stack of attention layers invariant to the rotation and translation (Jumper et al., 2021) of the input protein structure. Let h ℓ i denote the embedding of the i-th amino acid output by the previous attention layer. The logit of attention weights between residue i and residue j is defined as:where Q, K, G, and H are MLP networks that transform the embeddings into queries, keys, pairwise bias, and distance bias, respectively. The attention weight is computed by taking the softmax on the j dimension:In practice, we use multiple attention heads. Each attention head has different attention weights. The vector to update the representation of residue i from residue j is computed by:Finally, the sum of {v ij } j weighted by {w ij } j is used to update the representation of residue j using residual connection and layer normalization, similar to the standard transformer architecture.The rotamer density estimator has 6 encoder layers. Node features and pairwise features have 128 channels and 64 channels respectively. The normalizing flow has 8 blocks and each spline has 65 knots, dividing [0, 2π] into 64 bins.

A.2 TRAINING

The rotamer density estimator is trained using the Adam optimizer for 200K iterations. The initial learning rate is 0.0001. The learning rate decays by 0.8 if the validation loss does not decrease in the last 5 validation steps (the model is validated every 1000 iterations), until the learning rate reaches 0.000001. The batch size is 64. It takes 8h56m in total on a single A100 GPU.To emulate mutations, the rotamers of 10% of amino acids are masked. Noise is added to the rotamers of amino acids whose C-beta distance to the closest masked amino acid is less than 8.0 Å. The noise added to χ angles consists of two components. The first component is Gaussian noise centered at 0 wrapped into [-π, π] . It standard deviation is dependent on the C-beta distance:, where i is the index of the nearest masked amino of the j-th amino acid that the noise is added to, and β ij is the C-beta distance between them. The second component is random flipping (adding π to the angle). Every χ angle in the 8 Å neighborhood has 25% chance of being flipped. Our noise model is totally empirical. There are other ways to perturb rotamers, for example, using rotamer libraries (Dunbrack Jr & Karplus, 1993; Bower et al., 1997; Dunbrack Jr, 2002; Shapovalov & Dunbrack Jr, 2011) . We leave the problem of finding an optimal noise model that emulates mutations in future work.

A.3 BASELINES

Baselines that require training and calibration using the SKEMPI2 dataset (DDGPred, End-to-End, B-factor, MIF-∆logit, MIF-Network) are trained independently using the 3 different splits of the dataset as described in Section 3.5. This is to ensure that every data point in the SKEMPI2 dataset is tested on once. Below are descriptions of the implementation of baseline methods.Rosetta (Alford et al., 2017; Leman et al., 2020) The version we used is 2021.16, and the scoring function is ref2015 cart. Every protein structures in the SKEMPI2 dataset are first pre-processed using the relax application.The mutant structure is built by cartesian ddg. The binding free energies of both wild-type and mutant structures are predicted by interface energy (dG separated/dSASAx100). Finally, the binding ∆∆G is calculated by substracting the binding energy of the wild-type structure from the binding energy of the mutant.FoldX (Delgado et al., 2019) Structures are first relaxed by the RepairPDB command. Mutant structures are built with the BuildModel command based on the repaired structure. The change in binding free energy ∆∆G is calculated by subtracting the wild-type energy from the mutant energy.

C.1 FURTHER ANALYSIS ON THE PERFORMANCE OF SEQUENCE-BASED METHODS

As discussed in Section 2.2, sequence-based (evolution-based) methods are unsuitable for predicting protein-protein interactions due to the lack of co-evolutionary information between the two proteins. This is supported by the results in Table 1 , which indicate that sequence-based methods are inaccurate in predicting ∆∆G. We analyzed two classes of PPIs to better understand the performance of sequence-based methods on PPIs.Antibody-antigen binding is a typical class of PPI that lacks co-evolutionary information. The variability of the binding interface of antibodies (complementarity determining region, CDR) means that there is no evolutionary history in the region, making it infeasible to mine the preference of mutations from sequence databases. Additionally, in most cases, antigens do not evolve to increase binding to specific antibodies, so sequence databases provide little information about mutational effects on antibody-antigen binding. We evaluated per-structure Spearman correlation coefficients of MSA Transformer and RDE-Network on antibody-antigen complexes from the SKEMPI datasets. The average Spearman score of MSA Transformer is 0.0744 and the average score of RDE-Network is 0.4284. Figure 4 shows the results, where the x-axis and y-axis are the per-structure Spearman coefficients of MSA Transformer and RDE-Network respectively. Orange crosses represent antibodyantigen complexes, and blue dots represent other complexes. The results indicate that when we have little co-evolutionary information such as in antibody-antigen binding, structure-based methods, represented by our RDE, perform better than evolution-based methods, represented by MSA Transformer.When the proteins in a complex come from the same organism, evolution-based methods are more likely to be effective. These proteins usually function together in the organism, so they evolve together. Mutations that enhance the complexation may be more favorable, and this preference might be reflected in evolutionary history. We inspected 10 complexes on which MSA Transformer performed the best in terms of Spearman coefficients and found that 9 of them consist of proteins from a single organism (Table 5 ). However, when evaluating MSA Transformer on all the singleorganism complexes, its Spearman score is 0.1651, which is still low. The reason may be that even if the proteins in a complex come from the same organism, the member protein might also bind to other proteins to be functional. In this case, it is more challenging to predict its binding in a specific complex according to its general evolutionary history. 

