PROTEIN STRUCTURE REPRESENTATION LEARNING THROUGH ORIENTATION-AWARE GRAPH NEURAL NETWORKS

Abstract

By folding to particular 3D structures, proteins play a key role in living beings. To learn meaningful representation from a protein structure for downstream tasks, not only the global backbone topology but the local fine-grained orientational relations between amino acids should also be considered. In this work, we propose the Orientation-Aware Graph Neural Networks (OAGNNs) to better sense the geometric characteristics in protein structure (e.g. inner-residue torsion angles, inter-residue orientations). Extending a single weight from a scalar to a 3D vector, we construct a rich set of geometric-meaningful operations to process both the classical and SO(3) representations of a given structure. To plug our designed perceptron unit into existing Graph Neural Networks, we further introduce an equivariant message passing paradigm, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments have shown that our OAGNNs have a remarkable ability to sense geometric orientational features compared to classical networks. OAGNNs have also achieved state-of-theart performance on various computational biology applications related to protein 3D structures.

1. INTRODUCTION

Built from a sequence of amino-acid residues, a protein performs its biological functions by folding to a particular conformation in 3D space. Therefore, untilizing such 3D structures accurately is the key for downstream analysis. While we have witnessed remarkable progress in protein structure predictions (Rohl et al., 2004; Källberg et al., 2012; Baek et al., 2021; Jumper et al., 2021) , another thread of tasks with protein 3D structures as input starts to draw a great interest, such as function prediction (Hermosilla et al., 2020; Gligorijević et al., 2021 ), decoy ranking (Lundström et al., 2001; Kwon et al., 2021; Wang et al., 2021 ), protein docking (Duhovny et al., 2002; Shulman-Peleg et al., 2004; Gainza et al., 2020; Sverrisson et al., 2021) , and driver mutation identification (Lefèvre et al., 1997; Antikainen & Martin, 2005; Li et al., 2020; Jankauskaitė et al., 2019) . Most existing works in modeling protein structures directly borrow models designed for other applications, including 3D-CNNs (Ji et al., 2012 ) in computer vision, Transformers (Vaswani et al., 2017) ) from natural language processing, and GNNs (Kipf & Welling, 2016) in data mining. Though compatible with general objects, these models have overlooked the subtleties in the fine-grained geometries, which are much more essential in protein structures. For instance, given an amino acid in the protein structure, as shown in Figure 1 , the locations of four backbone atoms (carbon, nitrogen, and oxygen) determine a local skeleton, and different residues interact with each other through performing specific orientations between their local frames, either of which have important impacts on the protein structure and its function (Nelson et al., 2008) . Recent attempts in building geometric-aware neural networks mainly focus on baking 3D rigid transformations into network operations, leading to the area of SO(3)-invariant and equivariant networks. One representative work is the Vector Neuron Network (VNN) (Deng et al., 2021) , which achieves SO(3)-equivariance on point clouds by generalizing scalar neurons to 3D vectors. Another work is the GVP-GNN (Jing et al., 2021) that similarly vectorizes hidden neurons in GNN and demonstrates better prediction accuracy on protein design and quality evaluation tasks. However,

