EDGEFORMERS: GRAPH-EMPOWERED TRANSFORM-ERS FOR REPRESENTATION LEARNING ON TEXTUAL-EDGE NETWORKS

Abstract

Edges in many real-world social/information networks are associated with rich text information (e.g., user-user communications or user-product reviews). However, mainstream network representation learning models focus on propagating and aggregating node attributes, lacking specific designs to utilize text semantics on edges. While there exist edge-aware graph neural networks, they directly initialize edge attributes as a feature vector, which cannot fully capture the contextualized text semantics of edges. In this paper, we propose Edgeformers 1 , a framework built upon graph-enhanced Transformers, to perform edge and node representation learning by modeling texts on edges in a contextualized way. Specifically, in edge representation learning, we inject network information into each Transformer layer when encoding edge texts; in node representation learning, we aggregate edge representations through an attention mechanism within each node's ego-graph. On five public datasets from three different domains, Edgeformers consistently outperform state-of-the-art baselines in edge classification and link prediction, demonstrating the efficacy in learning edge and node representations, respectively.

1. INTRODUCTION

Networks are ubiquitous and are widely used to model interrelated data in the real world, such as user-user and user-item interactions on social media (Kwak et al., 2010; Leskovec et al., 2010) and recommender systems (Wang et al., 2019; Jin et al., 2020) . In recent years, graph neural networks (GNNs) (Kipf & Welling, 2017; Hamilton et al., 2017; Velickovic et al., 2018; Xu et al., 2019) have demonstrated their power in network representation learning. However, a vast majority of GNN models leverage node attributes only and lack specific designs to capture information on edges. (We refer to these models as node-centric GNNs.) Yet, in many scenarios, there is rich information associated with edges in a network. For example, when a person replies to another on social media, there will be a directed edge between them accompanied by the response texts; when a user comments on an item, the user's review will be naturally associated with the user-item edge. To utilize edge information during network representation learning, some edge-aware GNNs (Gong & Cheng, 2019; Jiang et al., 2019; Yang & Li, 2020; Jo et al., 2021) have been proposed. Nevertheless, these studies assume the information carried by edges can be directly described as an attribute vector. This assumption holds well when edge features are categorical (e.g., bond features in molecular graphs (Hu et al., 2020) and relation features in knowledge graphs (Schlichtkrull et al., 2018) ). However, effectively modeling free-text edge information in edge-aware GNNs has remained elusive, mainly because bag-of-words and context-free embeddings (Mikolov et al., 2013) used in previous edge-aware GNNs cannot fully capture contextualized text semantics. For example, "Byzantine" in history book reviews and "Byzantine" in distributed system papers should have different meanings given their context, but they correspond to the same entry in a bag-of-words vector and have the same context-free embedding. To accurately capture contextualized semantics, a straightforward idea is to integrate pretrained language models (PLMs) (Devlin et al., 2019; Liu et al., 2019; Clark et al., 2020) with GNNs. In node-centric GNN studies, this idea has been instantiated by a PLM-GNN cascaded architecture (Fang et al., 2020; Li et al., 2021; Zhu et al., 2021) , where text information is first encoded by a PLM and then aggregated by a GNN. However, such architectures process text and graph signals one after



Code can be found at https://github.com/PeterGriffinJin/Edgeformers. 1

