PROTEIN SEQUENCE AND STRUCTURE CO-DESIGN WITH EQUIVARIANT TRANSLATION

Abstract

Proteins are macromolecules that perform essential functions in all living organisms. Designing novel proteins with specific structures and desired functions has been a long-standing challenge in the field of bioengineering. Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models, both of which suffer from high inference costs. In this paper, we propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state from random initialization, based on context features given a priori. Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features, and a roto-translation equivariant decoder that translates protein sequence and structure interdependently. Notably, all protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process. Experimental results across multiple tasks show that our model outperforms previous state-of-the-art baselines by a large margin, and is able to design proteins of high fidelity as regards both sequence and structure, with running time orders of magnitude less than sampling-based methods.

1. INTRODUCTION

Proteins are macromolecules that mediate the fundamental processes of all living organisms. For decades, people are seeking to design novel proteins with desired properties (Huang et al., 2016) , a problem known as de novo protein design. Nevertheless, the problem is very challenging due to the tremendous search space of both sequence and structure, and the most well-established approaches still rely on hand-crafted energy functions and heuristic sampling algorithms (Leaver-Fay et al., 2013; Alford et al., 2017) , which are prone to arriving at suboptimal solutions and are computationally intensive and time-consuming. Recently, machine learning approaches have demonstrated impressive performance on different aspects of protein design, and significant progress has been made (Gao et al., 2020) . Most approaches use deep generative models to design protein sequences based on corresponding structures (Ingraham et al., 2019; Jing et al., 2021; Hsu et al., 2022) . Despite their great potential for protein design, the structures of proteins to be engineered are often unknown (Fischman & Ofran, 2018) , which hinders the application of these methods. Therefore, efforts have been made to develop models that co-design the sequence and structure of proteins (Anishchenko et al., 2021; Wang et al., 2021) . As a pioneering work, Jin et al. (2021) propose an autoregressive model that co-designs the Complementarity Determining Regions (CDRs) sequence and structure of antibodies based on iterative refinement of protein structures, which spurs a lot of follow-up works (Luo et al., 2022; Kong et al., 2022) . Nevertheless, these approaches are tailored for antibodies and their effectiveness remains unclear on proteins with arbitrary domain topologies (Anand & Achim, 2022) . In addition, they often suffer from high inference costs due to autoregressive sampling or annealed diffusion sampling (Song & Ermon, 2019; Luo et al., 2022) . Very recently, Anand & Achim (2022) propose another diffusion-based generative model (Ho et al., 2020) for general protein sequence-structure co-design, where they adopt three diffusion models to generate structures, sequences, and rotamers of proteins in sequential order. Although applicable to proteins of all topologies, such a sequential generation strategy fails to cross-condition on sequence and structure, which might lead to inconsistent proteins. Besides, the inference process is also expensive due to the use of three separate diffusion processes. G W S T E L E K H … M Y S R R L L Q H … A Y S D E Q L E K … G W S T E L E K H … M Y S R R L L Q H … A Y S D E Q L E K … … … Backbone structure Sequences To address the aforementioned issues, in this paper, we propose a new method capable of protein sequence-structure equivariant co-design called PROTSEED. Specifically, we formulate the co-design task as a translation problem in the joint sequence-structure space based on context features. Here the context features represent prior knowledge encoding constraints that biologists want to impose on the protein to be designed (Dou et al., 2018; Shen et al., 2018) . As an illustration, we present three protein design tasks with different given context features in Figure 1 . Our PROTSEED consists of a trigonometry-aware encoder that infers geometrical constraints and prior knowledge for protein design from context features, and a novel roto-translation equivariant decoder that iteratively translates proteins into desired states in an end-to-end and equivariant manner. The equivariance property with respect to protein structures during the whole process is guaranteed by predicting structure updates in local frames based on invariant representations, and then transforming them into global frames using change of basis operation. It is worth mentioning that PROTSEED updates sequence and structure of all residues in an one-shot manner, leading to a much more efficient inference process. In contrast to previous method that first generates structure and then generates sequence and rotamers, we allow the model to cross-condition on sequence and structure, and encourage the maximal information flow among context features, sequences, and structures, which ensure the fidelity of generated proteins. We conduct extensive experiments on the Structural Antibody Database (SAbDab) (Dunbar et al., 2014) as well as two protein design benchmark data sets curated from CATH (Orengo et al., 1997) , and compare PROTSEED against previous state-of-the-art methods on multiple tasks, ranging from antigen-specific antibody CDR design to context-conditioned protein design and fixed backbone protein design. Numerical results show that our method significantly outperforms previous baselines and can generate high fidelity proteins in terms of both sequence and structure, while running orders of magnitude faster than sampling-based methods. As a proof of concept, we also show by cases that PROTSEED is able to perform de novo protein design with new folds.

2. RELATED WORK

Protein Design. The most well-established approaches on protein design mainly rely on handcrafted energy functions to iteratively search low-energy protein sequences and conformations with heuristic sampling algorithms (Leaver-Fay et al., 2013; Alford et al., 2017; Tischer et al., 2020) . Nevertheless, these conventional methods are computationally intensive, and are prone to arriving at local optimum due to the complicated energy landscape. Recent advances in deep generative models open the door to data-driven approaches, and a variety of models have been proposed to generate protein sequences (Rives et al., 2021; Shin et al., 2021; Ferruz et al., 2022) or backbone structures (Anand & Huang, 2018; Eguchi et al., 2022; Trippe et al., 2022) . To have fine-grain control over designed proteins, methods are developed to predict sequences that can fold into given backbone structures (Ingraham et al., 2019; Jing et al., 2021; Anand et al., 2022; Dauparas et al., 2022) , a.k.a. fixed backbone design, which achieve promising results but require the desired protein structure to be known a priori. Recently, a class of approaches that generate both protein sequence and structure by network hallucination have emerged (Anishchenko et al., 2021; Wang et al., 2021) , which carry out thousands of gradient descent steps in sequence space to optimize loss functions calculated by pre-trained protein structure prediction models (Yang et al., 2020; Jumper et al., 2021) . However, the quality of designed proteins usually relies on the accuracy of structure prediction models, and is sensitive to different random startings. On the other hand, attempts have been made to co-design CDR sequence and structure of antibodies using either autoregressive models (Saka et al., 2021; Jin et al., 2021) or diffusion models (Luo et al., 2022) . Nevertheless, they are restricted to proteins with specific domain topologies and often suffer from the time-consuming Monte Carlo sampling process. Going beyond antibodies, Anand & Achim (2022) adopt three separate diffusion models to generate sequences, structures, and rotamers of proteins sequentially. Such a method is inefficient and fails to cross-condition on both protein sequence and structure. Our model also seeks to co-design protein sequence and structure, but is able to cross-condition on sequence and structure, while being much more efficient. 3D Structure Prediction. Our work is also related to approaches that perform 3D structure prediction by iteratively translating structures in three dimensional space equivariantly (Shi et al., 2021; Luo & Hu, 2021; Hoogeboom et al., 2022; Xu et al., 2022; Zhu et al., 2022) . However, these methods represent structures as either molecular graphs or point clouds, and are not applicable to protein structures. On the other hand, protein folding models (Jumper et al., 2021; Baek et al., 2021) that perform protein structure prediction require complete protein sequences as well as their Multiple Sequence Alignments (MSAs) as input, and cannot co-design protein sequence and structure directly.

3.1. PRELIMINARIES

Notations. Proteins are macromolecules that can be viewed as chains of amino acids (residues) connected by peptide bonds. In this paper, an amino acid can be represented by its type s i ∈ {1, • • • 20}, C α coordinates x i ∈ R 3 , and the frame orientation O i ∈ SO(3), where i ∈ {1, • • • N } and N is the number of residues in a protein. The x i and O i form a canonical orientation frame with respect to the N, C and C β atoms, from which the backbone atom positions can be derived. We denote the one-hot encoding of the residue type as s i = onehot(s i ). In the protein sequence and structure co-design task, researchers often provide context features as input to encourage designed proteins to have desired structural properties. These context features can either be single (residue) features m i ∈ R cm (e.g., amino acid secondary structure annotations) or pair features z ij ∈ R cz (e.g., binary contact features between residues). With the above notations, a protein with N residues can be compactly denoted as P = {(s i , x i , O i )} N i=1 . The context features known a priori can be denoted as {m i } ∈ R N ×cm and {z ij } ∈ R N ×N ×cz . Problem Formulation. Given a set of context features {m i } ∈ R N ×cm and {z ij } ∈ R N ×N ×cz , the task of protein sequence and structure co-design is the joint generation of residue types and 3D conformations of a protein with N residues, i.e., the conditional generation of P = {(s i , x i , O i )} N i=1 based on {m i } and {z ij }. Note that context features vary from setting to setting. For example, in antibody CDR design (Jin et al., 2021; Luo et al., 2022) , they are derived from antibody framework and binding antigen structures with CDR region masked, while in full protein design (Anand & Achim, 2022) , they can be secondary structure annotations and residue-residue contact features. Overview. In this paper, we formulate protein sequence and structure co-design as an equivariant translation problem in the joint sequence-structure space. Specifically, we develop a trigonometryaware context encoder to first reason geometrical constraints encoded in context features. Based on the updated context features, protein sequence and structure are jointly generated in an iterative manner by a novel roto-translation equivariant decoder, starting from randomly initialized structures and residue types (illustrated in Figure 2 ). To model interactions between sequence and structure during decoding, we allow information to flow between context features, structures, and residue types in each translation step. In this way, the generated sequence and structure are ensured to be consistent. The pseudo-code of the whole framework can be found in Algorithm 1. The rest of this section is organized as follows: Section 3.2 introduces the trigonometry-aware context encoder. Section 3.3 elaborates the iterative joint sequence-structure decoder and training objectives.  P d H m G N T 6 B k B d o 4 H g I Q F d d t g j I E M = " > A A A B 9 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x G Q Y 9 B L x 4 j m A c k a 5 i d z C Z D Z h / O 9 A b C k u / w 4 k E R r 3 6 M N / / G 2 W Q P m l g w U F R 1 0 z X l x V J o t O 1 v q 7 C 2 v r G 5 V d w u 7 e z u 7 R + U D 4 9 a O k o U 4 0 0 W y U h 1 P K q 5 F C F v o k D J O 7 H i N P A k b 3 v j 2 8 x v T 7 j S I g o f c B p z N 6 D D U P i C U T S S 2 w s o j h i V a W P 2 i P 1 y x a 7 a c 5 B V 4 u S k A j k a / f J X b x C x J O A h M k m 1 7 j p 2 j G 5 K F Q o m + a z U S z S P K R v T I e 8 a G t K A a z e d h 5 6 R M 6 M M i B 8 p 8 0 I k c / X 3 R k o D r a e B Z y a z k H r Z y 8 T / v G 6 C / r W b i j B O k I d s c c h P J M G I Z A 2 Q g V C c o Z w a Q p k S J i t h I 6 o o Q 9 N T y Z T g L H 9 5 l b R q V e e i W r u / r N R v 8 j q K c A K n c A 4 O X E E d 7 q A B T W D w B M / w C m / W x H q x 3 q 2 P x W j B y n e O 4 Q + s z x 8 a G Z J Q < / l i c Q O j m L o h 7 g k W M I J B S 2 6 q M o / d p n D i Z K b p F U t 2 2 R 7 D m i f O l J S q B 9 7 H d + n + r e Y V P z v d i C Q h F U A 4 V q r t 2 D G 4 K Z b A C K e Z 2 U k U j T E Z 4 B 5 t a y p w S J W b j o / O r C O t d K 0 g k r o E W G P 1 9 0 S K Q 6 V G o a 8 7 Q w x 9 N e v l 4 n 9 e O 4 H g 3 E 2 Z i B O g g k w W B Q m 3 I L L y B K w u k 5 Q A H 2 m C i W T 6 V o v 0 s c Q E d E 5 5 C M 7 s y / O k U S k 7 p + X K t U 7 j A k 1 Q Q P v o E B 0 j B 5 2 h K r p C N V R H B N 2 h B / S E n o 2 h 8 W i 8 G K + T 1 g V j O r O H / s B 4 / w H D d J U c < / l a t e x i t > Orientation O t+1 i < l a t e x i t s h a 1 _ b a s e 6 4 = " O o 6 C f R V m L y 6 Y O r 9 v g A x 5 C f N q o a s = " > A A A B 8 3 i c b V D L S s N A F L 2 p r 1 p f V Z e 6 G C y C I J S k L t R d w Y 0 7 K 9 g H t D F M p t N 2 6 G Q S Z i Z i C f k N N y 4 U c S f + h 2 t 3 f o c / 4 K T t Q l s P X D i c c y / 3 3 u N H n C l t 2 1 9 W b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R U G E t C 6 y T k o W z 5 W F H O B K 1 r p j l t R Z L i w O e 0 6 Q 8 v M r 9 5 R 6 V i o b j R o 4 i 6 A e 4 L 1 m M E a y N 1 k q v U Y 7 e J P n b S g l c s 2 W V 7 D D R P n C k p V f e 9 j + / S / V v N K 3 5 2 u i G J A y o 0 4 V i p t m N H 2 k 2 w 1 I x w m h Y 6 s a I R J k P c p 2 1 D B Q 6 o c p P x z S k 6 N E o X 9 U J p S m g 0 V n 9 P J D h Q a h T 4 p j P A e q B m v U z 8 z 2 v H u n f m J k x E s a a C T B b 1 Y o 5 0 i L I A U J d J S j Q f G Y K J Z O Z W R A Z Y Y q J N T F k I z u z L 8 6 R R K T s n 5 c q 1 S e M c J s j D H h z A E T h w C l W 4 h B r U g U A E D / A E z 1 Z s P V o v 1 u u k N W d N Z 3 b h D 6 z 3 H 1 L 3 l O Q = < / l a t e x i t > Position C↵ < l a t e x i t s h a 1 _ b a s e 6 4 = " K k J 2 h v 6 I Q C B p 2 a P 4 A T i O P 9 G 4 5 N Y = " > A A A B 7 3 i c b V A 9 S w N B E J 2 L G m P 8 i l r a H A b B K t z F Q s t A G s s E z A c k R 5 j b 7 C V L 9 v b O 3 T 0 h H P k R W m i h i J 3 4 V y z t x D / j 5 q P Q x A c D j / d m m J n n x 5 w p 7 T h f V m Z t f S O 7 m d v K b + / s 7 u 0 X D g 6 b K k o k o Q 0 S 8 U i 2 f V S U M 0 E b m m l O 2 7 G k G P q c t v x R d e q 3 b q l U L B L X e h x T L 8 S B Y A E j q I 3 U r v a 6 y O M h 9 g p F p + T M Y K 8 S d 0 G K l W z 9 + + P h 7 q 3 W K 3 x 2 + x F J Q i o 0 4 a h U x 3 V i 7 a U o N S O c T v L d R N E Y y Q g H t G O o w J A q L 5 3 d O 7 F P j d K 3 g 0 i a E t q e q b 8 n U g y V G o e + 6 Q x R D 9 W y N x X / 8 z q J D i 6 9 l I k 4 0 V S Q + a I g 4 b a O 7 O n z d p 9 J S j Q f G 4 J E M n O r T Y Y o k W g T U d 6 E 4 C 6 / v E q a 5 Z J 7 X i r X T R p l m C M H x 3 A C Z + D C B V T g C m r Q A A I c 7 u E J n q 0 b 6 9 F 6 s V 7 n r R l r M X M E f 2 C 9 / w A 2 7 5 P I < / l a t e x i t > x t+1 i < l a t e x i t s h a 1 _ b a s e 6 4 = " y 6 L 4 s h a 1 _ b a s e 6 4 = " y 6 L 4 M i 3  M i 3 v k K k b j u 9 6 U a a G 2 Q X g 2 G Q = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z e 6 G C y C I J S k L t R d w Y 3 L C v a B b Q y T 6 a Q d O p m E m Y m 0 h P 6 F G x e K u O 2 P u H b n d / g D T l o X 2 n r g w u G c e 7 n 3 H j / m T G n b / r R y S 8 s r q 2 v 5 9 c L G 5 t b 2 T n F 3 r 6 G i R B J a J x G P Z M v H i n I m a F 0 z z W k r l h S H P q d N f 3 C V + c 0 H K h W L x K 0 e x d Q N c U + w g B G s j X Q 3 9 N h 9 q k + d c c E r l u y y P Q V a J M 4 P K V U P v f e v 0 n B S 8 4 o f n W 5 E k p A K T T h W q u 3 Y s X Z T L D U j n I 4 L n U T R G J M B 7 t G 2 o Q K H V L n p 9 O I x O j Z K F w W R N C U 0 m q q / J 1 I c K j U K f d M Z Y t 1 X 8 1 4 m / u e 1 E x 1 c u C k T c a K p I L N F Q c K R Y = " > A A A B 7 3 i c b V A 9 S w N B E J 2 L G m P 8 i l r a H A b B K t z F Q s t A G s s E z A c k R 5 j b 7 C V L 9 v b O 3 T 0 h H P k R W m i h i J 3 4 V y z t x D / j 5 q P Q x A c D j / d m m J n n x 5 w p 7 T h f V m Z t f S O 7 m d v K b + / s 7 u 0 X D g 6 b K k o k o Q 0 S 8 U i 2 f V S U M 0 E b m m l O 2 7 G k G P q c t v x R d e q 3 b q l U L B L X e h x T L 8 S B Y A E j q I 3 U r v a 6 y O M h 9 g p F p + T M Y K 8 S d 0 G K l W z 9 + + P h 7 q 3 W K 3 x 2 + x F J Q i o 0 4 a h U x 3 V i 7 a U o N S O c T v L d R N E Y y Q g H t G O o w J A q L 5 3 d O 7 F P j d K 3 g 0 i a E t q e q b 8 n U g y V G o e + 6 Q x R D 9 W y N x X / 8 z q J D i 6 9 l I k 4 0 V S Q + a I g 4 b a O 7 O n z d p 9 J S j Q f G 4 J E M n O r T Y Y o k W g T U d 6 E 4 C 6 / v E q a 5 Z J 7 X i r X T R p l m C M H x 3 A C Z + D C B V T g C m r Q A A I c 7 u E J n q 0 b 6 9 F 6 s V 7 n r R l r M X M E f 2 C 9 / w A 2 7 5 P I < / l a t e x i t > s t+1 i < l a t e x i t s h a 1 _ b a s e 6 4 = " v g 9 d X Q / u N 6 K R X X x 9 / t 1 j 8 2 P c c e c = " > A A A B 8 n i c b V B N S 8 N A E N 3 4 W e N X 1 a O X x S I I Q k m q o N 4 K X j x W s B / Q x r L Z b t q l m 9 2 w O x F K 6 M / w 4 k E R r / 4 a b / 4 b N 2 0 O 2 v p g 4 P H e D D P z w k R w A 5 7 3 7 a y s r q 1 v b J a 2 3 O 2 d 3 b 3 9 8 s F h y 6 h U U 9 a k S i j d C Y l h g k v W B A 6 C d R L N S B w K 1 g 7 H t 7 n f f m L a c C U f Y J K w I C Z D y S N O C V i p a / r 8 M Y N z f + q 6 / X L F q 3 o z 4 G X i F 6 S C C j T 6 5 a / e Q N E 0 Z h K o I M Z 0 f S + B I C M a O B V s 6 v Z S w x J C x 2 T I u p Z K E j M T Z L O T p / j U K g M c K W 1 L A p 6 p v y c y E h s z i U P b G R M Y m U U v F / / z u i l E 1 0 H G Z Z I C k 3 S + K E o F B o X z / / G A a 0 Z B T C w h V H N 7 K 6 Y j o g k F m 1 I e g r / 4 8 j J p 1 a r + R b V 2 f 1 m p 3 x R x l N A x O k F n y E d X q I 7 u U A M 1 E U U K P a N X 9 O a A 8 + K 8 O x / z 1 h W n m D l C f + B 8 / g D y + p B Y < / l a t e x i t > O t+1 i < l a t e x i t s h a 1 _ b a s e 6 4 = " O o 6 C f R V m L y 6 Y O r 9 v g A x 5 C f N q o a s = " > A A A B 8 3 i c b V D L S s N A F L 2 p r 1 p f V Z e 6 G C y C I J S k L t R d w Y 0 7 K 9 g H t D F M p t N 2 6 G Q S Z i Z i C f k N N y 4 U c S f + h 2 t 3 f o c / 4 K T t Q l s P X D i c c y / 3 3 u N H n C l t 2 1 9 W b m F x a X k l v 1 p Y W 9 / Y 3 C p u 7 z R U G E t C 6 y T k o W z 5 W F H O B K 1 r p j l t R Z L i w O e 0 6 Q 8 v M r 9 5 R 6 V i o b j R o 4 i 6 A e 4 L 1 m M E a y N 1 k q v U Y 7 e J P n b S g l c s 2 W V 7 D D R P n C k p V f e 9 j + / S / V v N K 3 5 2 u i G J A y o 0 4 V i p t m N H 2 k 2 w 1 I x w m h Y 6 s a I R J k P c p 2 1 D B Q 6 o c p P x z S k 6 N E o X 9 U J p S m g 0 V n 9 P J D h Q a h T 4 p j P A e q B m v U z 8 z 2 v H u n f m J k x E s a a C T B b 1 Y o 5 0 i L I A U J d J S j Q f G Y K J Z O Z W R A Z Y Y q J N T F k I z u z L 8 6 R R K T s n 5 c q 1 S e M c J s j D H h z A E T h w C l W 4 h B r U g U A E D / A E z 1 Z s P V o v 1 u u k N W d N Z 3 b h D 6 z 3 H 1 L 3 l O Q = < / l a t e x i t > x t+1 i < l a t e x i t v k K k b j u 9 6 U a a G 2 Q X g 2 G Q = " > A A A B 8 X i c b V D L S s N A F L 2 p r 1 p f V Z e 6 G C y C I J S k L t R d w Y 3 L C v a B b Q y T 6 a Q d O p m E m Y m 0 h P 6 F G x e K u O 2 P u H b n d / g D T l o X 2 n r g w u G c e 7 n 3 H j / m T G n b / r R y S 8 s r q 2 v 5 9 c L G 5 t b 2 T n F 3 r 6 G i R B J a J x G P Z M v H i n I m a F 0 z z W k r l h S H P q d N f 3 C V + c 0 H K h W L x K 0 e x d Q N c U + w g B G s j X Q 3 9 N h 9 q k + d c c E r l u y y P Q V a J M 4 P K V U P v f e v 0 n B S 8 4 o f n W 5 E k p A K T T h W q u 3 Y s X Z T L D U j n I 4 L n U T R G J M B 7 t G 2 o Q K H V L n p 9 O I x O j Z K F w W R N C U 0 m q q / J 1 I c K j U K f d M Z Y t 1 X 8 1 4 m / u e 1 E x 1 c u C k T c a K p I L N F Q c K R j l D 2 P u o y S Y n m I 0 M w k c z c i k g f S 0 y 0 C S k L w Z l / e Z E 0 K m X n r F y 5 M W l c w g x 5 O I A j O A E H z q E K 1 1 C D O h A Q 8 A j P 8 G I p 6 8 l 6 t d 5 m r T n r Z 2 Y f / s C a f A P G o 5 Q B < / l i i f t A O w 4 r L 2 h K q I z k + 8 w = " > A A A B 6 H i c b Z D J S g N B E I Z r 4 h b H L e r R S 2 M Q P I W Z e N C L G P D i S R I w C y R D 6 O n U J G 1 6 F r p 7 h D D k C b x 4 U M S r P o T v 4 U V 8 G z v L Q a M / N H z 8 f x V d V X 4 i u N K O 8 2 X l l p Z X V t f y 6 / b G 5 t b 2 T m F 3 r 6 H i V D K s s 1 j E s u V T h Y J H W N d c C 2 w l E m n o C 2 z 6 w 8 t J 3 r x D q X g c 3 e h R g l 5 I + x E P O K P a W L X r b q H o l J y p y F 9 w 5 1 C 8 e L f P k 7 d P u 9 o t f H R 6 M U t D j D Q T V K m 2 6 y T a y 6 j U n A k c 2 5 1 U Y U L Z k P a x b T C i I S o v m w 4 6 J k f G 6 Z E g l u Z F m k z d n x 0 Z D Z U a h b 6 p D K k e q M V s Y v 6 X t V M d n H k Z j 5 J U Y 8 R m H w W p I D o m k 6 1 J j 0 t k W o w M U C a 5 m Z W w A Z W U a X M b 2 x z B X V z 5 L z T K J f e k V K 4 5 x U o Z Z s r D A R z C M b h w C h W 4 g i r U g Q H C P T z C k 3 V r P V j P 1 s u s N G f N e / b h l 6 z X b + z m j / M = < / l a t e x i t > C↵ < l a t e x i t s h a 1 _ b a s e 6 4 = " W n o 5 H 1 4 v Y / f F 6 o y 3 z g L E 8 L Y p d O U = " > A A A B 8 H i c b V C 7 S g N B F J 2 N r x h f 8 d G J M B g E q 7 A b C 7 U L p L G M Y B 6 S L O H u Z D Y Z M j O 7 z M w K I e Q r b C w U s f V P b O 3 s 9 T t 0 N k m h i Q c u H M 6 5 l 3 v v C W L O t H H d D y e z t L y y u p Z d z 2 1 s b m 3 v 5 H f 3 6 j p K F K E 1 E v F I N Q P Q l D N J a 4 Y Z T p u x o i A C T h v B o J L 6 j T u q N I v k j R n G 1 B f Q k y x k B I y V b i u d N v C 4 D 7 l O v u A W 3 Q n w I v F m p F D e / v 5 6 O z r 4 r H b y 7 + 1 u R B J B p S E c t G 5 5 b m z 8 E S j D C K f j X D v R N A Y y g B 5 t W S p B U O 2 P J g e P 8 Y l V u j i M l C 1 p 8 E T 9 P T E C o f V Q B L Z T g O n r e S 8 V / / N a i Q k v / B G T c W K o J N N F Y c K x i X D 6 P e 4 y R Y n h Q 0 u A K G Z v x a Q P C o i x G a U h e P M v L 5 J 6 q e i d F U v X N o 1 L N E U W H a J j d I o 8 d I 7 K 6 A p V U Q 0 R J N A 9 e k R P j n I e n G f n Z d q a c W Y z + + g P n N c f e n K T 6 Q = = < / l a t e x i t > C < l a t e x i t s h a 1 _ b a s e 6 4 = " g W j 0 d c P 3 u / d x n R 5 W L O 0 6 Q 9 G Y s + o = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x G Q b 0 F c v G Y g H l A s o T Z S W 8 y Z n Z 2 m Z k V Q s g X e P G g i F c / y Z t / 4 y T Z g y Y W N B R V 3 X R 3 B Y n g 2 r j u t 5 P b 2 N z a 3 s n v F v b 2 D w 6 P i s c n L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M a 3 O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l R q 1 f L L l l d w G y T r y M l C B D v V / 8 6 g 1 i l k Y o D R N U 6 6 7 n J s a f U m U 4 E z g r 9 F K N C W V j O s S u p Z J G q P 3 p 4 t A Z u b D K g I S x s i U N W a i / J 6 Y 0 0 n o S B b Y z o m a k V 7 2 5 + J / X T U 1 4 6 0 + 5 T F K D k i 0 X h a k g J i b z r 8 m A K 2 R G T C y h T H F 7 K 2 E j q i g z N p u C D c F b f X m d t C p l 7 6 p c a V y X q n d Z v 4 U V 8 G z v L Q a M / N H z 8 f x V d V X 4 i u N K O 8 2 X l l p Z X V t f y 6 / b G 5 t b 2 T m F 3 r 6 H i V D K s s 1 j E s u V T h Y J H W N d c C 2 w l E m n o C 2 z 6 w 8 t J 3 r x D q X g c 3 e h R g l 5 I + x E P O K P a W L X r b q H o l J y p y F 9 w 5 1 C 8 e L f P k 7 d P u 9 o t f H R 6 M U t D j D Q T V K m 2 6 y T a y 6 j U n A k c 2 5 1 U Y U L Z k P a x b T C i I S o v m w 4 6 J k f G 6 Z E g l u Z F m k z d n x 0 Z D Z U a h b 6 p D K k e q M V s Y v 6 X t V M d n H k Z j 5 J U Y 8 R m H w W p I D o m k 6 1 J j 0 t k W o w M U C a 5 m Z W w A Z W U a X M b 2 x z B X V z 5 L z T K J f e k V K 4 5 x U o Z Z s r D A R z C M b h w C h W 4 g i r U g Q H C P T z C k 3 V r P V j P 1 s u s N G f N e / b h l 6 z X b + z m j / M = < / l a t e x i t > C↵ < l a t e x i t s h a 1 _ b a s e 6 4 = " W n o 5 H 1 4 v Y / f F 6 o y 3 z g L E 8 L Y p d O U = " > A A A B 8 H i c b V C 7 S g N B F J 2 N r x h f 8 d G J M B g E q 7 A b C 7 U L p L G M Y B 6 S L O H u Z D Y Z M j O 7 z M w K I e Q r b C w U s f V P b O 3 s 9 T t 0 N k m h i Q c u H M 6 5 l 3 v v C W L O t H H d D y e z t L y y u p Z d z 2 1 s b m 3 v 5 H f 3 6 j p K F K E 1 E v F I N Q P Q l D N J a 4 Y Z T p u x o i A C T h v B o J L 6 j T u q N I v k j R n G 1 B f Q k y x k B I y V b i u d N v C 4 D 7 l O v u A W 3 Q n w I v F m p F D e / v 5 6 O z r 4 r H b y 7 + 1 u R B J B p S E c t G 5 5 b m z 8 E S j D C K f j X D v R N A Y y g B 5 t W S p B U O 2 P J g e P 8 Y l V u j i M l C 1 p 8 E T 9 P T E C o f V Q B L Z T g O n r e S 8 V / / N a i Q k v / B G T c W K o J N N F Y c K x i X D 6 P e 4 y R Y n h Q 0 u A K G Z v x a Q P C o i x G a U h e P M v L 5 J 6 q e i d F U v X N o 1 L N E U W H a J j d I o 8 d I 7 K 6 A p V U Q 0 R J N A 9 e k R P j n I e n G f n Z d q a c W Y z + + g P n N c f e n K T 6 Q = = < / l a t e x i t > C < l a t e x i t s h a 1 _ b a s e 6 4 = " g W j 0  d c P 3 u / d x n R 5 W L O 0 6 Q 9 G Y s + o = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x G Q b 0 F c v G Y g H l A s o T Z S W 8 y Z n Z 2 m Z k V Q s g X e P G g i F c / y Z t / 4 y T Z g y Y W N B R V 3 X R 3 B Y n g 2 r j u t 5 P b 2 N z a 3 s n v F v b 2 D w 6 P i s c n L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M a 3 O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l R q 1 f L L l l d w G y T r y M l C B D v V / 8 6 g 1 i l k Y o D R N U 6 6 7 n J s a f U m U 4 E z g r 9 F K N C W V j O s S u p Z J G q P 3 p 4 t A Z u b D K g I S x s i U N W a i / J 6 Y 0 0 n o S B b Y z o m a k V 7 2 5 + J / X T U 1 4 6 0 + 5 T F K D k i 0 X h a k g J i b z r 8 m A K 2 R G T C y h T H F 7 K 2 E j q i g z N p u C D c F b f X m d t C p l 7 6 p c a V y X q n d Z v 4 U V 8 G z v L Q a M / N H z 8 f x V d V X 4 i u N K O 8 2 X l l p Z X V t f y 6 / b G 5 t b 2 T m F 3 r 6 H i V D K s s 1 j E s u V T h Y J H W N d c C 2 w l E m n o C 2 z 6 w 8 t J 3 r x D q X g c 3 e h R g l 5 I + x E P O K P a W L X r b q H o l J y p y F 9 w 5 1 C 8 e L f P k 7 d P u 9 o t f H R 6 M U t D j D Q T V K m 2 6 y T a y 6 j U n A k c 2 5 1 U Y U L Z k P a x b T C i I S o v m w 4 6 J k f G 6 Z E g l u Z F m k z d n x 0 Z D Z U a h b 6 p D K k e q M V s Y v 6 X t V M d n H k Z j 5 J U Y 8 R m H w W p I D o m k 6 1 J j 0 t k W o w M U C a 5 m Z W w A Z W U a X M b 2 x z B X V z 5 L z T K J f e k V K 4 5 x U o Z Z s r D A R z C M b h w C h W 4 g i r U g Q H C P T z C k 3 V r P V j P 1 s u s N G f N e / b h l 6 z X b + z m j / M = < / l a t e x i t > C ↵ < l a t e x i t s h a 1 _ b a s e 6 4 = " W n o 5 H 1 4 v Y / f F 6 o y 3 z g L E 8 L Y p d O U = " > A A A B 8 H i c b V C 7 S g N B F J 2 N r x h f 8 d G J M B g E q 7 A b C 7 U L p L G M Y B 6 S L O H u Z D Y Z M j O 7 z M w K I e Q r b C w U s f V P b O 3 s 9 T t 0 N k m h i Q c u H M 6 5 l 3 v v C W L O t H H d D y e z t L y y u p Z d z 2 1 s b m 3 v 5 H f 3 6 j p K F K E 1 E v F I N Q P Q l D N J a 4 Y Z T p u x o i A C T h v B o J L 6 j T u q N I v k j R n G 1 B f Q k y x k B I y V b i u d N v C 4 D 7 l O v u A W 3 Q n w I v F m p F D e / v 5 6 O z r 4 r H b y 7 + 1 u R B J B p S E c t G 5 5 b m z 8 E S j D C K f j X D v R N A Y y g B 5 t W S p B U O 2 P J g e P 8 Y l V u j i M l C 1 p 8 E T 9 P T E C o f V Q B L Z T g O n r e S 8 V / / N a i Q k v / B G T c W K o J N N F Y c K x i X D 6 P e 4 y R Y n h Q 0 u A K G Z v x a Q P C o i x G a U h e P M v L 5 J 6 q e i d F U v X N o 1 L N E U W H a J j d I o 8 d I 7 K 6 A p V U Q 0 R J N A 9 e k R P j n I e n G f n Z d q a c W Y z + + g P n N c f e n K T 6 Q = = < / l a t e x i t > C < l a t e x i t s h a 1 _ b a s e 6 4 = " g W j 0 d c P 3 u / d x n R 5 W L O 0 6 Q 9 G Y s + o = " > A A A B 6 H i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x G Q b 0 F c v G Y g H l A s o T Z S W 8 y Z n Z 2 m Z k V Q s g X e P G g i F c / y Z t / 4 y T Z g y Y W N B R V 3 X R 3 B Y n g 2 r j u t 5 P b 2 N z a 3 s n v F v b 2 D w 6 P i s c n L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M a 3 O / / Y R K 8 1 g + m E m C f k S H k o e c U W O l R q 1 f L L l l d w G y T r y M l C B D v V / 8 6 g 1 i l k Y o D R N U 6 6 7 n J s a f U m U 4 E z g r 9 F K N C W V j O s S u p Z J G q P 3 p 4 t A Z u b D K g I S x s i U N W a i / J 6 Y 0 0 n o S B b Y z o m a k V 7 2 5 + J / X T U 1 4 6 0 + 5 T F K D k i 0 X h a k g J i b z r 8 m A K 2 R G T C y h T H F 7 K 2 E j q i g z N p u C D c V i T 8 h a O U / H v l h z Z C y / l G U t o = " > A A A B 8 n i c b V B N S w M x E M 3 W r 1 q / q h 6 9 B I s g C M t u V f R Y 9 O K x g v 2 A 7 V q y a d q G Z p M l m R X K 0 p / h x Y M i X v O i I D F l g q S c x M m M 1 O n u A T q / R w X 2 l b E v B M / T 2 R k d i Y c R z Z z p j A 0 C x 6 U / E / L 0 i h f x 1 m X C Y p M E n n i / q p w K D w 9 H / c 4 5 p R E G N L C N X c 3 o r p k G h C w a Z U s i H 4 i y 8 v k 2 b V 9 c / d 6 v 1 F p X a T x 1 F E R + g Y n S I f X a E a u k N 1 1 E A U K f S M X t Q c q F Q v F n Z 5 G 1 A v w S L A h I 1 g b q W + X e w H W Y 4 J 5 0 k j v E 3 3 q p n 2 7 4 l S d G d A y c X N S g R y N v v 3 V G 4 Q k D q j Q h G O l u q 4 T a S / B U j P C a V r q x Y p G m E z w i H Y N F T i g y k t m 0 V N 0 b J Q B G o b S P K H R T P 2 9 k e B A q W n g m 8 k s q F r 0 M v E / r x v r 4 a W X M B H F m g o y P z S M O d I h y n p A A y Y p 0 X x q C C a S m a y I j L H E R J u 2 S q Y E d / H L y 6 R V q 7 p n 1 d r t e a V + l d d R h E M 4 g h N w 4 Q L q c A M N a A K B R 3 i G V 3 i z n q w X 6 9 3 6 m I 8 W r H z n A P 7 A + v w B Q M S T / Q = = < / l a t e x i t > T Layers z t ij < l a t e x i t s h a 1 _ b a s e 6 4 = " Z P r + R i w M Z L f C p W 1 g F J 8 K h U w + I 9 Y = " > A A A B 7 3 i c b V D L T g J B E O z F F + I L 9 e h l I j H x R H b R R I 9 E L x 4 x k U c C S G a H W R i Z n V 1 n e k 1 w w 0 9 4 8 a A x X v 0 d b / 6 N A + x B 0 U o 6 q V R 1 p 7 v L j 6 U w 6 L p f T m 5 p e W V 1 L b 9 e 2 N j c 2 t 4 p 7 u 4 1 T J R o x u s s k p F u + d R w K R S v o 0 D J W 7 H m N P Q l b / q j y 6 n f f O D a i E j d 4 D j m 3 Z A O l A g E o 2 i l 1 m M v F X e T W + w V S 2 7 Z n Y H 8 J V 5 G S p C h 1 i t + d v o R S 0 K u k E l q T N t z Y + y m V K N g k k 8 K n c T w m L I R H f C 2 p Y q G 3 H T T 2 b 0 T c m S V P g k i b U s h m a k / J 1 I a G j M O f d s Z U h y a R W 8 q / u e 1 E w z O u 6 l Q c Y J c s f m i I J E E I z J 9 n v S F 5 g z l 2 B L K t L C 3 E j a k m j K 0 E R V s C N 7 i y 3 9 J o 1 L 2 T s q V 6 9 N S 9 S K L I w 8 H c A j H 4 M E Z V O E K a l A H B h K e 4 A V e n X v / + g f H j U N E m m G f d Z I h P d D q n h U i j u o 0 D J 2 6 n m N A 4 l b 4 W j 2 5 n f e u L a i E Q 9 4 D j l Q U w H S k S C U b S S H / f E I / b K F b f q z k F W i Z e T C u R o 9 M p f 3 X 7 C s p g r Z J I a 0 / H c F I M J 1 S i Y 5 N N S N z M 8 p W x E B 7 x j q a I x N 8 F k f u y U n F m l T 6 J E 2 1 J I 5 u r v i Q m N j R n H o e 2 M K Q 7 N s j c T / / M 6 G U b X w U S o N E O u 2 G J R l E m C C Z l 9 T v p C c 4 Z y b A l l W t h b C R t S T R n a f E o 2 B G / 5 5 V X S r F W 9 i 2 r t / r J S v 8 n j K M I J n M I 5 e H A F d b i D B v j A Q M A z v M K b o 5 w X 5 9 3 5 W L Q W n H z m G P 7 A + f w B 3 A 2 O t w = = < / l a t e x i t > z t+1 ij < l a t e x i t s h a 1 _ b a s e 6 4 = " y B f J b z X / 5 9 T s h 1 B M q n T 7 a i + f z V 4 = " > A A A B 9 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B E E q i g h 6 L X j x W s B / Q x r L Z b t q 1 m 0 3 c 3 R R q y O / w 4 k E R I U h x 4 n L a 8 0 X X m t 8 Z U K h a K O z 2 J q B v g g W A + I 1 g b y X 3 q J e w h v U / 0 i Z O W e u W K X b W n Q I v E y U k F c t R 7 5 a 9 u P y R x Q I U m H C v V c e x I u w m W m h F O 0 1 I 3 V j T C Z I Q H t G O o w A F V b j I 9 O k V H R u k j P 5 S m h E Z T 9 f d E g g O l J o F n O g O s h 2 r e y 8 T / v E 6 s / U s 3 Y S K K N R V k t s i P O d I h y h J A f S Y p 0 X x i C C a S m V s R G W K J i T Y 5 Z S E 4 8 y 8 v k u Z p 1 T m r n t 6 e V 2 p X e R x F O I B D O A Y H L q A G N 1 C H B h B 4 h G d 4 h T d r b L 1 Y 7 9 b H r L V g 5 T P 7 8 A f W 5 w 9 f h J H U < / l a t e x i t > m t+1 i < l a t e x i t s h a 1 _ b a s e 6 4 = " b 8 U 8 P 3 8 J W P x K W v l + L C M 9 t 8 W X 4 N A = " > A A A B 8 X i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R Z B E M p u F f R Y 9 O K x g v 3 A d i 3 Z N N u G J t k l y Q p l 6 b / w 4 k E R r / 4 b b / 4 b s + 0 e t P X B w O O 9 G W b m B T F n 2 r j u t 1 N Y W V 1 b 3 y h u l r a 2 d 3 b 3 y v s H L R 0 l i t A m i X i k O g H W l D N J m 4 Y Z T j u x o l g E n L a D 8 U 3 m t 5 + o 0 i y S 9 2 Y S U 1 / g o W Q h I 9 h Y 6 U H 0 2 W N q z r x p q V + u u F V 3 B r R M v J x U I E e j X / 7 q D S K S C C o N 4 V j r r u f G x k + x M o x w O i 3 1 E k 1 j T M Z 4 S L u W S i y o 9 t P Z x V N 0 Y p U B C i N l S x o 0 U 3 9 P p F h o P R G B 7 R T Y j P S i l 4 n / e d 3 E h F d + y m S c G C r J f F G Y c G Q i l L 2 P B k x R Y v j E E k w U s 7 c i M s I K E 2 N D y k L w F l 9 e J q 1 a 1 T u v 1 u 4 u K v X r P I 4 i H M E x n I I H l 1 C H W 2 h A E w h I e I Z X e H O 0 8 + K 8 O x / z 1 o K T z x z C H z i f P 7 T Y k E c = < / l a t e x i t >

Context Features Context Features

Figure 2 : Illustration of the joint sequence-structure translation process. In each translation layer, the network first captures the interactions of the current protein state and context features via SeqIPA, and then translates the protein sequence and structure into the next state equivariantly.

3.2. TRIGONOMETRY-AWARE CONTEXT ENCODER

Given single features and pair features as input, the goal of the context encoder is to capture the interactions between different context features and infer encoded constraints for the following protein sequence-structure co-design. We first embed single features and pair features into c dimensional space using Multiple Layer Perceptrons (MLPs). We then adopt a stack of L trigonometry-aware update layer to propagate information between single features and pair features. We represent the updated single features and pair features at l th layer as {m l i } and {z l ij }, respectively. At each layer, the single features are first updated using a variant of multi-head self-attention (denoted as MHA) (Vaswani et al., 2017) , with pair features serving as additional input to bias the attention map. Similar to Jumper et al. (2021) , the pair feature z l ij are then updated by the linear projection of outer product of single features m l+1 i and m l+1 j : m l+1 i = MHA({m l i }, {z l ij }), z l+0.5 ij = z l ij + Linear(m l+1 i ⊗ m l+1 j ), where ⊗ is the outer product operation. Notably, we enable the information to flow between single features and pair features to better model interactions between context features. Since pair features (e.g., contact map and distance map) are usually related with Euclidean distances and dihedral angles between residues, inspired by AlphaFold 2 (Jumper et al., 2021) , we adopt two trigonometry-aware operations (Eq. 4 and Eq. 5) in each layer to maintain geometric consistency and encourage pair features to satisfy the triangle inequality. Formally, we have: âij , bij = σ(z l+0.5 ij ) ⊙ Linear(z l+0.5 ij ), q ij , k ij , v ij , b ij = Linear(z l+0.75 ij ), z l+0.75 ij = z l+0.5 ij + σ(z l+0.5 ij ) ⊙ Linear k âik ⊙ bjk + âki ⊙ bkj , z l+1 ij = z l+0.75 ij + σ(z l+0.75 ij ) ⊙ k (α ijk v ik + α ijk v kj ), where σ(•) = sigmoid(Linear(•)), and α ijk = softmax k 1 √ c q ⊤ ij (k ik + k kj ) + b jk + b ki is the attention score of a novel trigonometry-aware attention. Intuitively, in our trigonometry-aware attention, the pair feature z ij is updated with neighboring features z ik and z kj , by enumerating all possible k that form a triangle ∆ ijk in terms of residues. Different from Jumper et al. (2021) , we tie the attention score α ijk within each triangle ∆ ijk to reduce the computational burden, while keeping the whole network sensitive to triangular interactions among residues. After L rounds of feature propagation, the updated single features {m L i } and pair features {z L ij } serve as inputs to the decoder for joint protein sequence-structure design.

3.3. JOINT SEQUENCE-STRUCTURE DECODER

In this section, we describe the proposed joint sequence-structure decoder, with the goal of iteratively translating protein sequence and structure into desired states from scratch based on context features. Simply parameterizing the decoder using two neural networks that generate sequence and structure separately is problematic, as we need to ensure the consistency between generated sequence and structure, i.e., the sequence folds to the structure. Meanwhile, we require our decoder to be rototranslation equivariant (Köhler et al., 2020; Jing et al., 2021) with respect to all protein structures during the decoding. To this end, we develop a novel roto-translation equivariant network composed of T consecutive translation layers with weight tying (Dehghani et al., 2018; Bai et al., 2019) . In each layer, we update context features, residue structures, and residue types interdependently, by allowing information to propagate among them. It is worth mentioning that residue types and residue structures of all amino acids are updated in one shot in each translation step, distinct from previous work that generates them autoregressively, which significantly accelerates the decoding procedure. We represent the updated residue types and residue structures at t th layer as P t = {(s t i , x t i , O t i )} N i=1 . Abusing the notation a little bit, we denote updated context features at t th layer as {m t i } and {z t ij }. Specifically, {m 0 i } and {z 0 ij } are the invariant output of context encoder introduced in Section 3.2. For each amino acid, we initialize the residue type as the uniform distribution over 20 amino acid types, i.e., s 0 i = 1 20 • 1, C α coordinates as origin in global frame, i.e., x 0 i = (0, 0, 0), and frame orientation as identity rotation, i.e., O 0 i = I 3 . We elaborate each translation layer next. Encoding Sequence-Structure Interplay. In each translation layer, we start by embedding current residue types into c dimensional space with a feedforward network MLP e : R 20 → R c , s t+0.5 i = MLP e (s t i ). We then adopt a variant of Invariant Point Attention (IPA) (Jumper et al., 2021) called SeqIPA to capture the interplay of residue types, residue structures and context features, integrating them all altogether into updated context features. Such a practice is often favored in literature (Anand & Achim, 2022; Luo et al., 2022; Tubiana et al., 2022) as it is aware of the orientation of each residue frame while being roto-translation invariant to input and output features. Distinct from vanilla IPA, our SeqIPA takes residue types as the additional input to bias the attention map and steer the representation of the whole protein generated so far: m t+1 i , z t+1 ij = SeqIPA({m t i }, {z t ij }, {s t+0.5 i }, {x t i }, {O t i }). (6) Note that SeqIPA is orientation-aware with respect to residue structures and roto-translation invariant to all other representations. We refer readers to Appendix B.2 for more details about the SeqIPA. Equivariant Structure Translation. Given updated context features, the protein structure is translated towards the next state by updating {x t i } and {O t i }. To update C α positions {x t i }, we first predict the change of coordinates (denoted as { xt i }) within each local residue frame specified by {O t i }. We then perform a change of basis using {O t i } to transform { xt i } from local frame into global frame (Kofinas et al., 2021; Hsu et al., 2022) to derive equivariant deviation of C α positions (Eq. 7). Intuitively, the deviation of C α positions rotate accordingly when residue frames rotate, which guarantees the equivariance of the C α translation step. The update for orientation frame O t i is computed by predicting a unit quaternion vector (Jia, 2008 ) with a feedforward network, which is then converted to a rotation matrix Ôt i , and left-multiplied by O t i to rotate the residue frame (Eq. 8). We adopt the unit quaternion here because it is a more concise representation of a rotation in 3D than a rotation matrix. Since the predicted unit quaternion is an invariant vector, the predicted rotation matrix Ôt i is also invariant. Therefore, the translation step of the residue frame is equivariant due to the multiplication of rotation matrices. We summarize the whole equivariant structure translation step as follows: xt i = MLP x (m t+1 i , m 0 i ), x t+1 i = x t i + O t i xt i , ( ) Ôt i = convert MLP o (m t+1 i , m 0 i ) , O t+1 i = O t i Ôt i , ) where convert is a function that converts a quaternion to a rotation matrix. One Shot Sequence Translation. The residue type of all amino acids are updated in one shot based on updated context features and current residue types in each translation step. Specifically, we use a feedforward network MLP s to predict residue type distributions over 20 amino acid types for the next iteration: s t+1 i = softmax λ • MLP s (m t+1 i , m 0 i , s t+0.5 i ) , is a diffusion-based method that achieves state-of-the-art performance on antibody design recently. Following previous works (Jin et al., 2021; Luo et al., 2022) , we use three metrics to evaluate the quality of designed CDRs: (1) Perplexity (PPL) measures the inverse likelihood of native sequences in the predicted sequence distribution, and a lower PPL stands for the higher likelihood. Note that PPL should be calculated for each sequence first and then averaged over the test set. For methods that do not define joint distributions over sequences strictly (e.g., DIFFUSION), we use the final sequence distribution to calculate the approximated PPL. (2) RMSD is the Root of Mean Squared Deviation of C α between generated CDRs and ground-truth CDRs with antibody frameworks aligned (Ruffolo et al., 2022) . A lower RMSD stands for a smaller discrepancy compared to native CDRs, which indicates a better CDR structure to bind the antigen. ( 3) Amino acid recovery rate (AAR) is the sequence identity between generated and ground-truth CDRs. Results. We notice that scripts used to calcuate the above metrics are inconsistent in previous works. For a fair comparison, we implement all the baselines and run each model three times with different random seeds. Following previous works (Jin et al., 2021; Luo et al., 2022) , the length of the CDR is set to be identical to the length of the ground-truth CDR for simplicity, and we sample 100 candidates with the lowest perplexity for each CDR for machine learning-based methods. We report the mean and standard deviation of the above metrics on the test set in Table 1 . Numerical results indicate that PROTSEED consistently outperforms previous state-of-the-art baselines by a clear margin on all three metrics for each type of CDRs, which confirms PROTSEED's ability to co-design antibodies conditioned on existing binding structures. In particular, as an energy-based method that performs graft antibody design, the performance of RABD is inferior to data-driven approaches. DIFFUSION outperforms GNN as it is equivariant and models the orientation of residues similar to our model, but its performance still falls short of ours. It is worth mentioning that the performance of all models on CDR H3 is worse than that on the other two CDR types, as CDR H3 is the most diverse region in an antibody that is critical to antigen binding. We present generated samples on CDR-H3 sequence-structure co-design and their sidechain interactions with binding antigens in Figure 3(b) . The PDB ID of the reference complex is 6FLA, and the antigen is Dengue Virus. The sidechain interactions between the antigen (red) and the antibody (green) are highlighted.

4.2. PROTEIN SEQUENCE AND STRUCTURE CO-DESIGN

Setup. This task evaluates models' capability to design protein sequence and structure conditioned on context features known a priori. The task is brought into attention by Anand & Achim (2022) recently, but there is no rigorous benchmark in the field of machine learning to the best of our knowledge. To this end, we collect 31,877 protein structures from the non-redundant set of CATH S40 (Orengo et al., 1997) , calculate amino acid secondary structure by DSSP (Kabsch & Sander, 1983) , and split the data set into training, validation, and test set at the topology level with a 90/5/5 ratio. We take secondary structure annotations as single features and binary contact matrices as pair features.foot_0 This setting represents the situation where biologists already know the topology of desired proteins (Dou et al., 2018; Shen et al., 2018) , and want the model to design novel proteins without specifying structures. For this task, we compare PROTSEED with the following two baselines that we believe best describe the current landscape of this task. AFDesignfoot_1 is a hallucination-based (Wang et al., 2021) method that generates protein sequence and structure by iteratively performing gradient descent in sequence space guided by AlphaFold2. We adopt AFDESIGN as one of our baselines because AlphaFold2 is the state-of-the-art protein structure prediction model, and it outperforms other hallucination-based methods. Diffusion (Anand & Achim, 2022 ) is another diffusion-based model that generates protein structure, sequence, and rotamers sequentially, and it is not restricted to antibodies. We use the PPL, RMSD, and AAR metrics introduced in Section 4.1 to evaluate the fidelity of designed proteins. Results. Since AFDESIGN is incapble of handling discrete contact matrices, we set the loss function that guides the gradient descent as the distogram loss computed by a pre-trained AlphaFold2 (Jumper et al., 2021) . For DIFFUSION, we implement it by ourselves as its source code is not released yet. We run each model for three times with different random seeds and report the mean and standard deviation of the above metrics on the test set. As shown in Table 2 , PROTSEED outperforms all the baselines on all three metrics significantly, which demonstrates its capability to generate proteins of high fidelity as regards both sequence and structure, and it is not limited to specific domain topologies. The performance of AFDESIGN falls short of other methods as it relies on gradient descent to optimize the sequence, which is prone to getting stuck in the local optimum due to the rough landscape of the loss function. The performance of DIFFUSION is also inferior to ours, as it generates protein using three separate diffusion models and fails to cross-condition on structure and sequence. In contrast, PROTSEED updates sequence and structure interdependently in an end-to-end manner. To demonstrate the efficiency of our method, we test inference stage of different approaches using a single V100 GPU card on the same machine, and present average runtime of these methods on proteins of different sizes. As indicated by Figure 3 (a), our PROTSEED runs orders of magnitude faster than two baseline models on all four protein sizes, as both AFDESIGN and DIFFUSION rely on time-consuming Monte Carlo sampling steps. Setup. The third task is to design protein sequences that can fold into given backbone structures, which is known as fixed backbone design. In this task, context features are dihedral angles and inter-residue distances derived solely from backbone coordinates (Jing et al., 2021) , and protein structures are fixed as ground truth in the decoder. We use the CATH 4.2 dataset curated by Ingraham et al. (2019) , and follow all experimental settings of Jing et al. (2021) rigorously for a fair comparison, i.e., using the same data splits and evaluation settings according to their official implementation. We compare PROTSEED with three baselines. Specifically, Structured GNN is an improved version of Ingraham et al. (2019) . GVP-GNN (Jing et al., 2021) and GVP-Transformer (Hsu et al., 2022 ) are state-of-the-art methods for fixed backbone design built upon Geometric Vector Perceptron (GVP) encoders (Jing et al., 2021) . Note that since we can not afford training models on AlphaFold2predicted data sets (Hsu et al., 2022) which requires hundreds of GPU days, we focus on the CATH data set for training. We evaluate the performance of all methods using PPL and AAR as introduced in Section 4.1, and drop the RMSD metric as structures are provided in this task. 90°( a) (b) (c) Results. Following Jing et al. (2021) , we report the evaluation results on three test splits including the full CATH 4.2 test set, the short subset (100 or fewer residues) and the single-chain subset. As shown in Table 3 , PROTSEED achieves competitive results against the state-of-the-art method GVP-GNN in terms of the perplexity, and outperforms all baselines in terms of the amino acid recovery rate. The results indicate that PROTSEED is a quite general protein design framework and is also superior in designing protein sequences conditioned on desired backbone structures. Note that the state-of-the-art method GVP-TRANSFORMER is outperformed by GVP-GNN when trained solely on CATH data set, which is consistent with the results reported in the original paper (Hsu et al., 2022) .

4.4. CASE STUDY

So far, we have evaluated PROTSEED's capability to design proteins of high fidelity on multiple settings. However, it remains unclear whether the proposed model can go beyond the topologies of existing proteins. To get insight into the proposed method and as a proof of concept, we manually construct a set of secondary structure annotations and contact features from scratch, and ask the model trained in the second task to perform de novo protein design based on the context features provided by us. In Figure 4 , we show that our PROTSEED succeeds in altering loop lengths of existing proteins, designing novel proteins with idealized topologies, and designing novel protein complexes with a custom number of secondary structures. Notably, the designed structures are in close agreement with structures predicted by AlphaFold, taking the designed sequences as input. We further perform protein sequence and structure search against all available databases using FoldSeek (van Kempen et al., 2022) and BLAST (Altschul et al., 1990) , and find these synthetic proteins are dissimilar to existing proteins regarding both sequence and structure. The case study serves as the first attempt to apply our model to de novo protein design in a more realistic setting, which reveals the possibility of PROTSEED being a powerful tool for protein design in biological research. We refer readers to Appendix C.1 for all details about the case study.

5. CONCLUSION AND FUTURE WORK

In this paper, we propose a novel principle for protein sequence and structure co-design called PROTSEED, which translates proteins in the joint sequence-structure space in an iterative and endto-end manner. PROTSEED is capable of capturing the interplay of sequence, structure, and context features during the translation, and owns a much more efficient inference process thanks to the one-shot translation strategy. Extensive experiments over a wide range of protein design tasks show that PROTSEED outperforms previous state-of-the-art baselines by a large margin, confirming the superiority and generality of our method. Further case studies on de novo protein design demonstrate PROTSEED's potential for more practical applications in biological research. Future work includes extending PROTSEED to the scaffolding task (Trippe et al., 2022) and adopting latent variables (Kingma & Welling, 2013) to enable the context-free protein design.

REPRODUCIBILITY STATEMENT

For the sake of reproducibility, the pseudo-code of PROTSEED, the parameterization of SeqIPA, as well as hyper-parameters and implementation details are provided in Appendix B. All codes, datasets, and experimental environments will be released upon the acceptance of this work.

A EQUIVARIANCE PROPERTY OF SEQUENCE AND STRUCTURE TRANSLATION

We first quickly recap the process of sequence and structure translation in each translation layer. At (t + 1) th layer, the decoder takes protein P t = {(s t i , x t i , O t i )} N i=1 and context features {m t i }, {z t ij } as the input. It encodes sequence-structure interplay and integrates all interactions into updated context features using SeqIPA adapted from Invariant Point Attention (IPA) (Jumper et al., 2021) in a way that its roto-translation invariant property is kept. The updates of C α positions, frame orientations, and type distributions are then predicted based on updated context features. The whole process can be summarized as follows: s t+0.5 i = MLP e (s t i ), m t+1 i , z t+1 ij = SeqIPA({m t i }, {z t ij }, {s t+0.5 i }, {x t i }, {O t i }), xt i = MLP x (m t+1 i , m 0 i ), x t+1 i = x t i + ∆x t i = x t i + O t i xt i , ( ) Ôt i = convert MLP o (m t+1 i , m 0 i ) , O t+1 i = O t i Ôt i , s t+1 i = softmax λ • MLP s (m t+1 i , m 0 i , s t+0.5 i ) . ( ) To derive the equivariance property of each translation step, we use three functions X , O, S to denote the network that predicts the C α position translation, orientation translation, and sequence translation described above, respectively. Formally, we have: ∆x t i = X (P t ), O t+1 i = O(P t ), s t+1 i = S(P t ). Note that X , O, S also take {m t i } and {z t ij } as input and we omit these context features for simplicty, as they remain invariant to global rigid transformations. X , O, S are not separate networks, and they share the same input and the same SeqIPA, but are equipped with different MLPs. With the above definitions, we can derive the following proposition: Proposition 1 (Roto-Translation Equivariance). Let T R,r denote any SE(3) transformation (rigid transformation) operating on the protein object P t = {(s t i , x t i , O t i )} N i=1 , with a rotation matrix R ∈ SO(3) and a translation vector r ∈ R 3 . The function X , O, S satisfy the following equivariance properties: X • T R,r (P t ) = RX (P t ), O • T R,r (P t ) = RO(P t ), S • T R,r (P t ) = S(P t ), where T R,r (P t ) = {(s t i , x t i + r, RO t i )} N i=1 . Intuitively, the proposition states that in each translation step, the updates of C α positions and frame orientations are equivariant with respect to input protein structures, and the updates of type distributions are invariant. Proof. We first prove that Eq. 20 holds. Notice that SeqIPA is aware of the orientations of the input structure, and the updated context features are invariant (Eq. 13). Therefore, the predicted deviation of C α positions, i.e., xt i , is invariant (Eq. 14). Then, we have: X • T R,r (P t ) = RO t i xt i = RX (P t ). The Eq. 21 and Eq. 22 can be proved in a similar way.

B MODEL DETAILS B.1 PSEUDO CODE

The pseudo code of PROTSEED is provided in Algorithm 1. The proposed PROTSEED consists of a trigonometry-aware encoder (Algorithm 1, line 2-7) that reasons geometrical constraints and interactions from context features, and a roto-translation equivariant decoder (Algorithm 1, line 10-19) that translates protein sequence and structure interdependently. Starting from the intial single features {m i } ∈ R N ×cm and pair features {z ij } ∈ R N ×N ×cz , the whole model iteratively translates both protein sequence and structure into the desired state from random initialization (Algorithm 1, line 9). We note that the whole process does not require MCMC sampling, and runs much faster than autoregressive models and diffusion-based models. The trigonometry-aware encoder is composed of a stack of L encoding layers. Each layer takes single features {m l i } and pair features {z l ij } from the last layer as its input, and updates these features with novel attention mechanisms. After L rounds of feature propagation, the updated single features and pair features serve as inputs to the decoder for joint protein sequence-structure design (Algorithm 1, line 8). The roto-translation equivariant decoder iteratively refines the concrete 3D atom coordinates of the protein from random initialization, based on context features calculated by the encoder. The interplay of residue types, residue structures and context features during the decoding process is captured by a novel orientation-aware attention mechanism (SeqIPA, Appendix B.2). The equivariance property of the structure translation process is guaranteed by mapping invariant predictions in local frames to global frames with change of basis (Algorithm 1, line 13-16). It is worth mentioning that PROTSEED updates sequence and structure of all residues in an one-shot manner, leading to a much more efficient inference process (Algorithm 1, line 17). Algorithm 1 PROTSEED Require: Initial single features {m i } ∈ R N ×cm and pair features {z ij } ∈ R N ×N ×cz . 1: m 0 i , z 0 ij ← Linear(m i ), Linear(z ij ) ▷ m 0 i ∈ R c , z 0 ij ∈ R c 2: for l ← 0 to L -1 do 3: m l+1 i ← MHA({m l i }, {z l ij }) ▷ Eq. 1 4: z l+0.5 ij ← z l ij + Linear(m l+1 i ⊗ m l+1 j ) ▷ Eq. 2 5: z l+0.75 ij ← z l+0.5 ij + TriangleUpdate 1 ({z l+0.5 ij )} ▷ Eq. 4 6: z l+1 ij ← z l+0.75 ij + TriangleUpdate 2 ({z l+0.75 ij )} ▷ Eq. 5 7: end for 8: m 0 i , z 0 ij ← m L i , z L ij ▷ Initialize context features for decoder 9: P 0 ← {(s 0 i , x 0 i , O 0 i )} N i=1 ← {( 1 20 • 1, (0, 0, 0), I 3 )} N i=1 ▷ Initialize protein P 0 10: for t ← 0 to T -1 do 11: s t+0.5 i ← MLP e (s t i ) ▷ s t+0.5 i ∈ R c 12: m t+1 i , z t+1 ij ← SeqIPA({m t i }, {z t ij }, {s t+0.5 i }, {x t i }, {O t i }) ▷ Eq. 6 and Section B.2 13:  xt i ← MLP x (m t+1 i , m 0 i ) ▷ Deviation of C α positions in local frame 14: x t+1 i ← x t i + O t i xt i ▷ Deviation of C α positions in global frame 15: Ôt i ← convert MLP o (m t+1 i , m 0 i ) ▷ Convert s t+1 i ← softmax λ • MLP s (m t+1 i , m 0 i , s t+0.5 i ) ▷ Eq. 9 18: P t+1 ← {(s t+1 i , x t+1 i , O t+1 i )} N i=1 19: end for Return: The trajectory of the protein translation {P t } T t=1 .

B.2 PARAMETERIZATION OF SEQIPA

SeqIPA is adapted from the Invariant Point Attention (IPA) (Jumper et al., 2021) , which takes residue types as the additional input to capture the interactions between current decoded sequences, structures, and the context features. We ensure that the additional input does not affect the invariance property of the IPA to make full use of its capacity. Specifically, we propose the following two strategies to parameterize the SeqIPA. SeqIPA-Addition. Given that {s t+0.5 i } share the same dimensionality with {m t i }, a very simple strategy is to just add embeddings of residue types onto single representations. Following the original implementation of IPA, we leave the pair features unchanged in this approach. m t+1 i , z t+1 ij = SeqIPA({m t i }, {z t ij }, {s t+0.5 i }, {x t i }, {O t i }) (24) = IPA({m t i + s t+0.5 i }, {z t ij }, {x t i }, {O t i }). The above equations say that for SeqIPA-Addition, we just add the sequence embeddings {s t+0.5 i } onto the single representation {m t } and feed four inputs to vanilla IPA (Jumper et al., 2021) . SeqIPA-Attention. Another more complicated strategy is to construct a new set of single representations and pair representations based on the embeddings of the current residue types. Then, we adopt a lightweight encoder similar to the encoder introduced in Section 3.2 to update m t i and z t ij , which are then fed into the vanilla IPA module. We summarize the computation flow as follows: mi , zij = Linear(s t+0.5 i ), Linear(s t+0.5 i + s t+0.5 j ) (26) mi , zij = Encoder({ mi }, { zij }), m t+0.5 i , z t+0.5 ij = m t i + mi , z t ij + zij , m t+1 i , z t+1 ij = SeqIPA({m t i }, {z t ij }, {s t+0.5 i }, {x t i }, {O t i }) = IPA({m t+0.5 i }, {z t+0.5 ij }, {x t i }, {O t i }). The above equations say that for SeqIPA-Attention, we leverage another lightweight encoder (Eq.27) similar to the encoder introduced in Section 3.2 to first update m t i and z t ij (Eq.28), and then feed updated inputs to vanilla IPA (Eq.30). In practice, we find both strategies work well and their performance is on par with each other. To make the whole model lightweight, we adopt the first strategy across all the experiments in this work. We emphasize that the parameterization of the SeqIPA is quite flexible, as long as it can model interactions between sequences, structures, and context features, and is invariant to the global transformation of input structures. For the concrete computation flows of the IPA module, we refer readers to Algorithm 22 described in the supplementary material of Jumper et al. (2021) .

B.3 HYPER-PARAMETERS AND IMPLEMENTATION DETAILS

PROTSEED is implemented in Pytorch. The trigonometry-aware context encoder is implemented with L = 8 layers, and the sequence-structure decoder is implemented with T = 8 layers. The hidden dimension is set as 128 for pair features and 256 for single features across all modules. For training, we use a learning rate of 0.001 with 2000 linear warmup iterations. We empirically find that proper learning rate warmup schedule can lead to faster convergence rate and higher performance. The model is optimized with Adam optimizer on four Tesla V100 GPU cards with distributed data parallel. The estimated time it takes to get a converged model is 24 hours. For inference, the temperature of the sequence distribution, i.e., λ, controls the sharpness of the distribution. The larger λ will lead to higher (better) AAR and higher (worse) PPL, and vice versa. Since it acts oppositely on AAR and PPL, we simply set it as 1 across the experiments. All codes, datasets, and experimental environments will be released upon the acceptance of this work.

B.4 FULL ATOM POSITIONS RECONSTRUCTION

The bond lengths and bond angles between backbone atoms are relatively conserved. The C α is connected with N, C, and C β (except for Glycine which has a single hydrogen atom as its side chain) atoms. The C α forms a canonical orientation frame with respect to N, C, and C β . Once we know the positions of the C α , and the orientation of the frame, the full backbone atom positions can be derived according to their averaged relative positions with respect to the C α recorded in literature. The positions of all sidechain atoms of the 20 different amino acids can also be compactly specified by four torsion angles (χ 1 , χ 2 , χ 3 , χ 4 ) (Jumper et al., 2021; McPartlon & Xu, 2022) , which also follow amino-acid specific distributions recorded in literature. All these recorded statistics can be found in https://git.scicore.unibas.ch/schwede/openstructure/-/ raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/ stereo_chemical_props.txt. To gain more insights into the effectiveness of each module in PROTSEED, we conduct additional ablation studies following the setting of the antibody CDR co-design in Section 4.1. Effectiveness of Cross-conditioning on Sequence and Structure. In this ablation study (denoted as No Sequence Interaction), we replace the SeqIPA in Eq.6 with the vanilla IPA (Jumper et al., 2021) , and directly predict the distribution of amino acid types for all residues at the last iteration of the decoding process using single features, i.e., {m T i }. The results shown in Table 4 indicate that when the model fails to cross-condition on both sequence and structure during the decoding, there is a significant performance drop in all three metrics, especially for PPL and AAR. This confirms the necessity to cross-condition on sequence and structure during the decoding, and the effectiveness of the proposed SeqIPA. Effectiveness of Iterative Translations. In this ablation study (denoted as Single Iteration Translation), we replace the T -layer decoder with a single-layer decoder for protein translation. We note that the vanilla decoder of PROTSEED is composed of T = 8 consecutive translation layers with tied weights, and the number of trainable parameters of these two models are the same due to the weight tying. As indicated by Table 4 , the single-layer decoder is outperformed by the vanilla decoder by a large margin on all metrics. Since the numbers of trainable parameters of these two models are the same, this is a fair comparison. The results justify the advantages of the iterative translation framework for protein sequence and structure co-design. Effectiveness of Context Feature Update. In this ablation study (denoted as No SeqIPA), we freeze all context feature updates in the decoder and remove SeqIPA (Eq.6), and use the outputs of the encoder as the context features during the whole decoding process. As shown by Table 4 , the performance degrades dramatically, which demonstrates that the context feature update plays a key role in PROTSEED.



In this work, we say two residues are in contact if the distance between two Cα is within 8 Å. https://github.com/sokrypton/ColabDesign



Figure 1: Illustration of three protein design tasks with different context features. (a) Antigen-specific CDR co-design given structure and sequence of antibody framework and the binding antigen. (b) Protein sequence-structure co-design conditioned on secondary structure (SS) annotation and binary contact features. (c) Fixed backbone sequence design conditioned on given backbone structures.

t e x i t s h a 1 _ b a s e 6 4 = "

t e x i t s h a 1 _ b a s e 6 4 = " X V p R h s w a m f c d / J R a 4 c A t e V B P w b I = " > A A A B 9 H i c b V D L S s N A F J 3 4 r P F V d a m L Y B E E o S R 1 o e 4 K b l x W s A 9 o Y 5 h M J + 3 Q y S T O 3 B R L y H e 4 c a G I K 8 H v c O 3 O 7 / A H n L R d a O u B C 4 d z 7 u X e e / y Y M w W 2 / W U s L C 4 t r 6 w W 1 s z 1 j c 2 t 7 e L O b k N F i S S 0 T i I e y Z a P F e V M 0 D o w 4 L Q V S 4 p D n 9 O m P 7 j M / e a Q S s U

j l D 2 P u o y S Y n m I 0 M w k c z c i k g f S 0 y 0 C S k L w Z l / e Z E 0 K m X n r F y 5 M W l c w g x 5 O I A j O A E H z q E K 1 1 C D O h A Q 8 A j P 8 G I p 6 8 l 6 t d 5 m r T n r Z 2 Y f / s C a f A P G o 5 Q B < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " K k J 2 h v 6 I Q C B p 2 a P 4 A T i O P 9 G 4 5 N

t e x i t s h a 1 _ b a s e 6 4 = " S w U y x N P

H H k 4 g 3 O 4 B A 9 u o A r 3 U I c m M E B 4 h l d 4 c x 6 d F + f d + V i 2 5 p x s 5 h T + w P n 8 A Z R + j M I = < / l a t e x i t > N < l a t e x i t s h a 1 _ b a s e 6 4 = " S w U y x N P i i f t A O w 4 r L 2 h K q I z k + 8 w = " > A A A B 6 H i c b Z D J S g N B E I Z r 4 h b H L e r R S 2 M Q P I W Z e N C L G P D i S R I w C y R D 6 O n U J G 1 6 F r p 7 h D D k C b x 4 U M S r P o T

H H k 4 g 3 O 4 B A 9 u o A r 3 U I c m M E B 4 h l d 4 c x 6 d F + f d + V i 2 5 p x s 5 h T + w P n 8 A Z R + j M I = < / l a t e x i t > SeqIPA Encoding Sequence Structure Interplay N < l a t e x i t s h a 1 _ b a s e 6 4 = " S w U y x N P i i f t A O w 4 r L 2 h K q I z k + 8 w = " > A A A B 6 H i c b Z D J S g N B E I Z r 4 h b H L e r R S 2 M Q P I W Z e N C L G P D i S R I w C y R D 6 O n U J G 1 6 F r p 7 h D D k C b x 4 U M S r P o T

F b f X m d t C p l 7 6 p c a V y X q n d Z H H k 4 g 3 O 4 B A 9 u o A r 3 U I c m M E B 4 h l d 4 c x 6 d F + f d + V i 2 5 p x s 5 h T + w P n 8 A Z R + j M I = < / l a t e x i t > s t+0.5 i < l a t e x i t s h a 1 _ b a s e 6 4 = " J P 1 5

0 1 3 v w 3 p u 0 e t P X B w O O 9 G W b m R Y n g B j z v 2 y m s r K 6 t b x Q 3 S 1 v b O 7 t 7 5 f 2 D p l G p p q x B l V C 6 H R H D B J e s A R w E a y e a k T g S r B W N b q d + 6 4 l p w 5 V 8 g H H C w p g M J O 9 z S s B K g e n y x w z O P P d y 0 i 1 X P N e b A S 8 T P y c V l K P e L X 9 1 e o q m M Z N A B T E m 8 L 0 E w o x o 4 F S w S a m T G p Y Q

G b A 8 6 L 8 + 5 8 z F s L T j 5 z i P 7 A + f w B a 4 C Q r w = = < / l a t e x i t > P t+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " n L Y M b S E / F i 5 9 V C d w k 5 k j 5 7 Y 1 / f M = " > A A A B + n i c b V D L S s N A F L 2 p r 1 p f q S 7 d D B Z B E E p S B V 0 W 3 b i s Y B / Q x j K Z T t u h k 0 m Y m S g l 5 l P c u F D E r V / i z r 9 x 0 m a h r Q c G D u f c y z 1 z / I g z p R 3 n 2 y q s r K 6 t b x Q 3 S 1 v b O 7 t 7 d n m / p c J Y E t o k I Q 9 l x 8 e K c i Z o U z P N a S e S F A c + p 2 1 / c p 3 5 7

n 2 X l z 3 u e t O S e b 2 Y d f c D 6 + A X 9 w k E Q = < / l a t e x i t > m t i < l a t e x i t s h a 1 _ b a s e 6 4 = " J d 0 Z b e C K / 7 h n U R k 4 J g O M 6 C o 5 f N s = " > A A A B 7 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 m q o M e i F 4 8 V T F t o Y 9 l s N + 3 S z S b s T o R S + h u 8 e F D E q z / I m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 M J X C o O t + O 4 W 1 9 Y 3 N r e J 2 a W d 3 b

r / 4 Y b / 4 b N 2 0 O 2 v p g 4 P H e D D P z v I g z p W 3 7 2 y o s L a + s r h X X S x u b W 9 s 7 5 d 2 9 p g p j S W i D h D y U b Q 8 r y p m g D c 0 0 p + 1

Figure 3: (a) Average runtime of three approaches on proteins of different sizes. Our PROTSEED runs orders of magnitude faster than the gradient-based method AFDESIGN and the diffusion-based method DIFFUSION. (b) Examples of CDR-H3 sequence and structure co-designed by our method.The PDB ID of the reference complex is 6FLA, and the antigen is Dengue Virus. The sidechain interactions between the antigen (red) and the antibody (green) are highlighted.

Figure 4: Example of novel proteins designed by PROTSEED. (a) Extending the loop of a native protein (marked in red). (b) Novel β-barrel design with different sizes. (c) Transmembrane protein complex design with a custom number of (twelve) α-helices.

PPL, RMSD and AAR of different approaches on the antibody CDR co-design task. (↑): the higher the better. (↓): the lower the better.

PPL, RMSD and AAR of different approaches on the protein co-design task. (↑): the higher the better. (↓): the lower the better.

PPL and AAR of different approaches on the fixed backbone sequence design task. (↑): the higher the better. (↓): the lower the better. Results of baselines are taken fromJing et al. (2021).

Ablation study on the antibody CDR co-design task. (↑): the higher the better. (↓): the lower the better.

ACKNOWLEDGEMENT

We would like to thank all the reviewers for their insightful comments. Jian Tang is supported by Twitter, Intel, the Natural Sciences and Engineering Research Council (NSERC) Discovery Grant, the Canada CIFAR AI Chair Program, Samsung Electronics Co., Ltd., Amazon Faculty Research Award, Tencent AI Lab Rhino-Bird Gift Fund, an NRC Collaborative R&D Project (AI4D-CORE-06) as well as the IVADO Fundamental Research Project grant PRF-2019-3583139727.

annex

where λ is a hyper-parameter controlling the temperature of the distribution. See Appendix B.3 for discussions on hyper-parameters and the equivariance property of sequence and structure translation.Summarizing the above, at (t + 1) th layer, the decoder takes P t = {(s t i , x t i , O t i )} N i=1 as the input and computes P t+1 = {(s t+1 i , x t+1 i , O t+1 i )} N i=1 as the output. Based on P t , we can efficiently reconstruct full backbone atom positions according to their averaged relative positions with respect to C α recorded in literature (Engh & Huber, 2012; Jumper et al., 2021) . It is worth mentioning that distinct from previous works that can only generate backbone atom positions (Jin et al., 2021; Kong et al., 2022; Luo et al., 2022) , our model is capable of full atom position generation by attaching the corresponding sidechain to each residue. In specific, we can enable the decoder to generate four additional torsion angles (χ 1 , χ 2 , χ 3 , χ 4 ) (McPartlon & Xu, 2022) that specify the geometry of sidechain atoms and reconstruct sidechain atom positions together with backbone atom positions.Training Objective. We denote the reconstructed full backbone atom positions at t th layer as {x t ij }, where j ∈ {1, 2, 3} is the index of backbone atoms (N, C α , C). We use the superscript true to denote the ground truth value for simplicity. The whole network can be jointly optimized, by defining an invariant cross-entropy loss ℓ ce over type distributions and an invariant frame align loss (Jumper et al., 2021) over full backbone atom positions at each translation layer:where) transforms the backbone coordinates from global coordinate system into local frame of k th residue frame at step t (specified by x t k and O t k ), and so does ρ true k . The loss defined in Eq. 11 essentially calculates the discrepancy of all backbone positions between the prediction and the ground truth, by aligning each residue frame one by one. As all backbone coordinates are transformed into local frames, the loss will stay invariant when two protein structures differ by an arbitrary rotation and an arbitrary translation. Combining Eq. 10 and Eq. 11, the final training objective is L = L type + L pos .

4. EXPERIMENTS

Following previous works (Jin et al., 2021; Jing et al., 2021; Anand & Achim, 2022) , we conduct extensive experiments and evaluate the proposed PROTSEED on the following three tasks: Antibody CDR Co-Design (Section 4.1), Protein Sequence-Structure Co-Design (Section 4.2), and Fixed Backbone Sequence Design (Section 4.3). We also show cases where PROTSEED successfully conducts de novo protein sequence design with new folds in Section 4.4. We describe all experimental setups and results in task-specific sections.

4.1. ANTIBODY CDR CO-DESIGN

Setup. The first task is to design CDR sequence and structure of antibodies, where the context features are amino acid types as well as inter-residue distances derived from antibody-antigen complexes with CDRs removed. The initial protein structure P 0 is set as the complex structure except for CDRs, which are randomly initialized. We retrieve antibody-antigen complexes in Apr. 2022 from Structural Antibody Database (SAbDab) (Dunbar et al., 2014) , and remove incomplete or redundant complexes, resulting in a subset containing 2,900 complex structures. Following Jin et al. (2021) , we focus on the design of heavy chain CDRs and curate three data splits for each type of CDRs (denoted as H1, H2, H3) by clustering corresponding CDR sequences via MMseqs2 (Steinegger & Söding, 2017) with 40% sequence identity. In total, there are 641, 963, and 1646 clusters for CDR H1, H2, and H3. The clusters are then divided into training, validation, and test set with a ratio of 8:1:1.We compare PROTSEED with the following three baselines. RosettaAntibodyDesign (RAbD) (Adolf-Bryfogle et al., 2018) is a physics-based antibody design software. GNN is an autoregressive model that co-designs sequence and structure similar to Jin et al. (2021) . We note that we can not directly compare PROTSEED with Jin et al. (2021) as the setting is different. Diffusion (Luo et al., 2022) Figure 5 : Superimposition of three generated proteins and their most similar proteins found in the PDB by FoldSeek. left: a novel protein with loop extended. middle: a novel β-barrel. right: a novel helical complex.

C EXPERIMENTAL DETAILS C.1 CASE STUDY

We conduct three case studies to evaluate PROTSEED's capability to perform de novo protein design, including extending the loop of existing proteins, designing novel β-barrels, and designing novel helical complexes. Specifically, we manually curate a set of secondary structure annotations and contact features, and ask the model trained in the second task (Section 4.2) to generate novel proteins based on these context features. We elaborate the way we design context features for each setting.Extending the Loop. In this setting, we start by calculating the secondary structure annotations and the contact matrix of an existing protein. We then insert n consecutive "C" ("C" is the secondary structure annotation for the loop) letters into the original secondary structure annotations at the position where we want to extend the loop. Similarly, we insert n consecutive rows and columns filled with zero into the original contact matrix. For a new-inserted residue indexed by i, we let it to be in contact with i -2, i -1, i, i + 1, i + 2.Designing Novel Beta Barrels. In this setting, we grab a simple pattern of β-barrel proteins and then repeat this pattern multiple times to construct the contact features. The secondary structure annotations are also calculated by repeating the annotations of the pattern multiple times.Designing Novel Helical Complexes. Similar to the second case, in this setting, we also take a simple pattern of helical complexes and construct the contact features by repeating it multiple times. The secondary structure annotations are all set to be "H".In Figure 5 , we show the superimposition of three novel proteins designed by PROTSEED against the most similar proteins in the PDB, one for each setting, which confirms the novelty of the designed proteins.

D.1 CONTEXT FEATURES

In this work, we use the concept of context features (single features {m i } and pair features {z ij }) as a formulation to unify the inputs of different protein design tasks. We note that {m i } and {z ij } vary from task to task, and for most well-defined tasks, they are easy to get. For example, in antibody CDR design tasks (Jin et al., 2021; Luo et al., 2022; Kong et al., 2022) , they can be derived from antibody frameworks and structures of binding antigens. In the general protein design task proposed by Anand & Achim (2022) , they can be derived from second structure annotations and contact maps provided by biologists. In scaffolding tasks, they can be derived from an starting motif (Trippe et al., 2022) . The more context features (or constraints) the researchers specify, the more control they can have over the designed proteins. We refer readers to Dou et al. (2018); Shen et al. (2018) for two cases of protein design in real world scenarios.

D.2 CONTEXT-FREE PROTEIN DESIGN

Researchers may be interested at generating novel proteins without relying on any context features, a.k.a. context-free protein design. We note that PROTSEED is a very general framework and can handle this situation with minor modifications. Specifically, we can adopt the Variational Autoencoder (VAE) framework (Kingma & Welling, 2013) and approximate the data distribution by learning to map proteins to latent vectors and reconstruct proteins from latents. In this scenario, the context features become the proteins sampled from target data distributions. Since this is out of the scope of this work, we leave it as our future work.

