DECENTRALIZED KNOWLEDGE GRAPH REPRESENTATION LEARNING

Abstract

Knowledge graph (KG) representation learning methods have achieved competitive performance in many KG-oriented tasks, among which the best ones are usually based on graph neural networks (GNNs), a powerful family of networks that learns the representation of an entity by aggregating the features of its neighbors and itself. However, many KG representation learning scenarios only provide the structure information that describes the relationships among entities, causing that entities have no input features. In this case, existing aggregation mechanisms are incapable of inducing embeddings of unseen entities as these entities have no pre-defined features for aggregation. In this paper, we present a decentralized KG representation learning approach, decentRL, which encodes each entity from and only from the embeddings of its neighbors. For optimization, we design an algorithm to distill knowledge from the model itself such that the output embeddings can continuously gain knowledge from the corresponding original embeddings. Extensive experiments show that the proposed approach performed better than many cutting-edge models on the entity alignment task, and achieved competitive performance on the entity prediction task. Furthermore, under the inductive setting, it significantly outperformed all baselines on both tasks.

1. INTRODUCTION

Knowledge graphs (KGs) support many data-driven applications (Ji et al., 2020) . Recently, learning low-dimensional representations (a.k.a. embeddings) of entities and relations in KGs has been increasingly given attentions (Rossi et al., 2020) . We find that existing models for KG representation learning share similar characteristics to those for word representation learning. For example, TransE (Bordes et al., 2013) , a well-known translational KG embedding model, interprets a triple (e 1 , r, e 2 ) as e 1 + r ≈ e 2 , where e 1 , e 2 , r denote subject, object and their relationship, respectively, and the boldfaces denote the corresponding embeddings. If we view e 1 as a word in sentences, and e 2 as well as many other objects of e 1 as the context words, then TransE and many KG embedding models (Wang et al., 2014; Dettmers et al., 2018; Nguyen et al., 2018; Kazemi & Poole, 2018; Sun et al., 2019) , learn representations in a form simaliar to that used in Skip-gram (Mikolov et al., 2013a) , where the input representation is learned to predict the context (i.e., neighbors) representations. Recently, many graph neural networks (GNNs) based models for KG representation learning (Wang et al., 2018; Schlichtkrull et al., 2018; Cao et al., 2019; Wu et al., 2019; Sun et al., 2020; Vashishth et al., 2020) have achieved state-of-the-art performance in KG-related tasks such as entity alignment and entity prediction. Those models learn KG representations in a CBOW (continuous bag-of-words) (Mikolov et al., 2013a) manner, in which the context entities are aggregated to predict the target. But they also consider the representation of an entity itself when aggregating the neighborhood information. This nature prevents those models (e.g., GCN (Kipf & Welling, 2017) and GAT (Velickovic et al., 2018) ) to be generalized to represent unseen entities. In many cases, the entities in prevalent KG-related tasks do not have self features. This motivates us to learn entity representations from and only from their context neighbors. We propose a decentralized KG representation learning approach, decentRL. The key idea of decentRL is to decentralize the semantic information of entities over only their neighbors (i.e., distributed context vector in CBOW (Mikolov et al., 2013b) ), which can be easily implemented by representing each entity through averaging its neighbor embeddings. In this paper, we look for a more efficient but still simple way to realize this concept on the most popular graph attention network (GAT) (Velickovic et al., 2018) , as well as its many variants (Sun et al., 2020; Vashishth et al., 2020) . We illustrate the methodology by decentralized attention network (DAN), which is based on the vallina GAT. DAN is able to support KG representation learning for unseen entities with only structure information, which is essentially different from the way of using self features (e.g., attribute information) in the existing graph embedding models (Hamilton et al., 2017; Bojchevski & Günnemann, 2018; Hettige et al., 2020) . Furthermore, the neighbors in DAN serve as integrity to give attentions, which means that DAN is more robust and more expressive compared with conventional graph attention mechanism (Velickovic et al., 2018) . Another key problem in decentralized KG representation learning is how to estimate and optimize the output embeddings. If we distribute the information of an entity over its neighbors, the original embedding of this entity e i also learns how to effectively participate in the aggregations of its different neighbors conversely. Suppose that we have obtained an output representation g i from DAN for entity e i , we can simply estimate and optimize g i by aligning it with e i . But directly minimizing the L1/L2 distance between g i and e i may be insufficient. Specifically, these two embeddings have completely different roles and functions in the model, and the shared information may not reside in the same dimensions. Therefore, maximizing the mutual information between them is a better choice. Different from the existing works like MINE (Belghazi et al., 2018) or InfoNCE (van den Oord et al., 2018) , in this paper, we design a self knowledge distillation algorithm, called auto-distiller. It alternately optimizes g i and its potential target e i , such that g i can automatically and continuously distill knowledge from the original representation e i across different batches. The main contributions of this paper are listed as follows. ( 1 

2. BACKGROUND

Knowledge Graph. A KG can be viewed as a multi-relational graph, in which nodes represent entities in the real world and edges have specific labels to represent different relationships between entities. Formally, we define a KG as a 3-tuple G = (E, R, T ), with E and R denoting the sets of entities and relationships, respectively. T is the set of relational triples. KG Representation Learning. Conventional models are mainly based on the idea of Skip-gram. According to the types of their score functions, these models can be divided into three categories: translational models (e.g., TransE (Bordes et al., 2013) and TransR (Lin et al., 2015a) ), semantic matching models (e.g., DistMult (Yang et al., 2015) and ComplEx (Trouillon et al., 2016) ) and neural models (e.g., ConvE (Dettmers et al., 2018) and RSN (Guo et al., 2019) ). We refer interested readers to the surveys (Wang et al., 2017; Ji et al., 2020) for details. Recently, GNN-based models receive great attentions in this field, which are closely related to this paper. Specifically, R-GCN (Schlichtkrull et al., 2018) , AVR-GCN (Ye et al., 2019) and CompGCN (Vashishth et al., 2020) introduce different relation-specific composition operations to combine neighbors and the corresponding relations before neighbor aggregation. RDGCN (Wu et al., 2019) refactors KGs as dual relation graphs (Monti et al., 2018) where edge labels are represented as nodes for graph convolution. All the aforementioned GNN-based models choose GCNs and/or GATs to aggregate the neighbors of an entity, in which an identity matrix is added to the adjacency matrix. This operation is helpful when elements have self features, but poses a problem in learning the representations of unseen entities where no self features are attached to them. Differently, decentRL fully relies on the neighbor context to attend to the neighbors of each entity in linear complexity, which is efficient and easy to be deployed. Entity Alignment. Entity alignment aims to find the potentially aligned entity pairs in two different KGs G 1 = (E 1 , R 1 , T 1 ) and G 2 = (E 2 , R 2 , T 2 ), given a limited number of aligned pairs as training data S ⊂ E 1 × E 2 . Oftentimes, G 1 , G 2 are merged to a joint KG G = (E, R, T ), which enables the models learn representations in a unified space.



) We propose decentralized KG representation learning, and present DAN as the prototype of graph attention mechanism under the open-world setting. (2) We design an efficient knowledge distillation algorithm to support DAN for generating representations of unseen entities. (3) We implement an end-to-end framework based on DAN and auto-distiller. The experiments show that it achieved superior performance on two prevalent KG representation learning tasks (i.e., entity alignment and entity prediction), and also significantly outperformed those cutting-edge models under the open-world setting.

