MATCHING RECEPTOR TO ODORANT WITH PROTEIN LANGUAGE AND GRAPH NEURAL NETWORKS

Abstract

Odor perception in mammals is triggered by interactions between volatile organic compounds and a subset of hundreds of proteins called olfactory receptors (ORs). Molecules activate these receptors in a complex combinatorial coding allowing mammals to discriminate a vast number of chemical stimuli. Recently, ORs have gained attention as new therapeutic targets following the discovery of their involvement in other physiological processes and diseases. To date, predicting molecule-induced activation for ORs is highly challenging since 43% of ORs have no identified active compound. In this work, we combine [CLS] token from protBERT with a molecular graph and propose a tailored GNN architecture incorporating inductive biases from the protein-molecule binding. We abstract the biological process of protein-molecule activation as the injection of a molecule into a protein-specific environment. On a newly gathered dataset of 46 700 ORmolecule pairs, this model outperforms state-of-the-art models on drug-target interaction prediction as well as standard GNN baselines. Moreover, by incorporating non-bonded interactions the model is able to work with mixtures of compounds. Finally, our predictions reveal a similar activation pattern for molecules within a given odor family, which is in agreement with the theory of combinatorial coding in olfaction.

1. INTRODUCTION

Mammalian sense of smell constantly provides information about the composition of the volatile chemical environment and is able to discriminate thousands of different molecules. At the atomic scale, volatile organic compounds are recognized by specific interactions with protein receptors expressed at the surface of olfactory neurons (Buck & Axel, 1991) . Mammalian epithelium expresses hundreds of different olfactory receptors (ORs), belonging to the G protein-coupled receptors (GPCRs), which constitute the largest known multigene family (Niimura & Nei, 2003) . The recognition of odorants by ORs is based on the complementarity of structures and hydrophobic or van der Waals interactions which leads to low molecular affinity (Katada et al., 2005) . With the exception of a few conserved amino acids, the sequences of ORs show little identity. In particular, the ligand-binding pocket has hypervariable residues (Pilpel & Lancet, 1999) that are relatively well conserved between orthologs. This property gives ORs the ability to bind a wide variety of molecules that differ in structure, size, or chemical properties. The recognition of odorants is done according to the combinatorial code of activation (Malnic et al., 1999) . Each odorant is recognized by several ORs, whereas an individual OR can bind several odorants with distinct affinities and specificities (Zhao et al., 1998) . This combinatorial code is sensitive to subtle modifications, so the response of a single receptor can have a major influence on the smell perception. Even a small sequence modification could affect odorant responsiveness (Keller et al., 2007; Mainland et al., 2014) . On the other hand, structural and functional modifications of an odorant can abolish the interaction with a specific receptor (Katada et al., 2005) , and even lead to a different smell perception (Sell, 2006) . So far the combinatorial code of the majority of odorants remains unknown. Identifying the recognition spectrum of each OR is therefore essential to decipher the mechanisms of the olfactory system and subsequently build models capable of cracking the combinatorial code of activation.

