IMPROVING PROTEIN INTERACTION PREDICTION US-ING PRETRAINED STRUCTURE EMBEDDING

Abstract

The study of protein-protein interactions (PPIs) plays an important role in the discovery of protein drugs and in revealing the behavior and function of cells. So far, most PPI prediction works focus on protein sequence and PPI network structure, but ignore the structural information of protein physical binding. This results in interacting proteins are not similar necessarily, while similar proteins do not interact with each other. In this paper, we design a novel method, called PSE4PPI, which can leverage pretrained structure embedding that contain further structural and physical pairwise relationships between amino acid structure information. And this method can be transferred to new ppi predictions, such as antibody-target interactions and PPIs across different species. Experimental results on PPi predictions show that our pretrained structure embedding leads to significant improvement in PPI prediction comparing to sequence and network based methods. Furthermore, we show that embeddings pretrained based on ppi from different species can be transferred to improve the prediction for human proteins.

1. INTRODUCTION

Proteins are the basic functional units of human biology. However, they rarely function alone and usually do so in an interactive manner. Protein-protein interactions (PPIs) are important for studying cytoomics and discovering new putative therapeutic targets to cure diseases Szklarczyk et al. (2015) . But these research processes usually require expensive and time-consuming wet experimental results to obtain PPI results. The purpose of PPI prediction is to predict there exists protein physical binding for a given pair of amino acid sequences of proteins or not. 2020), which can obtain the representation of proteins by the amino acid sequence of proteins and the local neighborhood structure of PPI network, and then calculate whether there is an interaction relationship between proteins. And the PPI prediction problem is often formalized as link prediction 1



Figure 1: The illustration of PPI network and Protein-protein physical binding. Recently, most PPI prediction works focus on protein sequence Sun et al. (2017); Hashemifar et al. (2018); Zhang et al. (2019) and PPI network structure Hamilton et al. (2017); Yang et al. (2020), which can obtain the representation of proteins by the amino acid sequence of proteins and the local neighborhood structure of PPI network, and then calculate whether there is an interaction relationship between proteins. And the PPI prediction problem is often formalized as link prediction

