BEYOND LINK PREDICTION: ON PRE-TRAINING KNOWLEDGE GRAPH EMBEDDINGS

Abstract

Knowledge graph embeddings (KGE) models provide low-dimensional representations of entities and relations in a knowledge graph (KG). Most prior work focuses on training and evaluating KGE models for the task of link prediction; the question of whether or not KGE models provide useful representations more generally remains largely open. In this work, we explore the suitability of KGE models (i) for more general graph-structure prediction tasks and (ii) for downstream tasks such as entity classification. For (i), we found that commonly trained KGE models often perform poorly at structural tasks other than link prediction. Based on this observation, we propose a more general multi-task training approach, which includes additional self-supervised tasks such as neighborhood prediction or domain prediction. In our experiments, these multi-task KGE models showed significantly better overall performance for structural prediction tasks. For (ii), we investigate whether KGE models provide useful features for a variety of downstream tasks. Here we view KGE models as a form of self-supervised pre-training and study the impact of both model training and model selection on downstream task performance. We found that multi-task pre-training can (but does not always) significantly improve performance and that KGE models can (but do not always) compete with or even outperform task-specific GNNs trained in a supervised fashion. Our work suggests that more research is needed on the relation between pretraining KGE models and their suitability for downstream applications.

1. INTRODUCTION

Knowledge graph embeddings (KGE) provide low-dimension representations of entities and relations of a knowledge graph (KG). Although a large number of KGE models have been proposed in the literature-see for example the surveys of Nickel et al. (2015) , Wang et al. (2017) and Ji et al. (2021)-, most prior work focuses on the task of link prediction, i.e., answering questions such as (Austin, capitalOf, ?) by reasoning over an incomplete KB. In addition to link prediction, it is often argued that KGEs can provide representations that capture semantic properties of the entities and, indeed, pre-trained KGE models have been used to inject structured knowledge into language models (He et al., 2020; Zhang et al., 2019 ), visual models (Baier et al., 2017) , recommender systems (El-Kishky et al., 2022; Wang et al., 2018) , question answering systems (Ilyas et al., 2022) and other types of downstream models (Wang et al., 2017) . The question of whether pre-trained KGE models provide generally useful representations remains largely open. Likewise, it is not well-understood how choices taken in model training and model selection affect these representations. In this work, we shed light onto these questions from multiple directions. First, we study the suitability of out-of-the-box KGE models for basic graph-structure prediction tasks beyond link prediction. In particular, we consider the tasks of predicting the relation of a triple as suggested by Chang et al. ( 2020) (e.g., the relationship between Austin and Texas), the domain and range of a relation (e.g., whether Austin is a capital), as well as entity and relation neighborhood of each entity (e.g., which other entities are related to Austin). Perhaps surprisingly, we found that commonly trained KGE models often performed poorly on such tasks, challenging the intuition that KGE models capture graph structure well. Second, we investigate whether KGE models are suitable pre-trained representations for node-level downstream tasks such as entity classification (e.g., the profession of a person) or regression (e.g., the average rating of a movie). To do so, we conducted an empirical study using 27 downstream tasks on two different KGs. We found that out-of-the-box KGE models often perform decent on these tasks and, in fact, the best KGE models can (but do not always) exceed the performance of recent graph neural networks such as KE- GCN (Yu et al., 2021) . However, the KGE models with best downstream task performance were often not the best-performing models for link prediction. For example, we found that the basic TransE model (Bordes et al., 2013 ) may be superior to KGE models more suited to link prediction such as ComplEx (Trouillon et al., 2016) or RotatE (Sun et al., 2019) . This suggests that link prediction performance is not necessarily indicative of downstream task performance. Both of these findings suggest that the focus on link prediction tasks is too narrow for pre-training KGE models, i.e., to provide generally useful features.. We thus explore whether the performance of KGE models for both graph-structure prediction and downstream tasks can be improved by better pre-training and model selection. Inspired by multi-task approaches in other areas-such as natural language processing (Aribandi et al., 2022; Sanh et al., 2022) or computer vision (Doersch & Zisserman, 2017)-, we included the graph-structure prediction tasks discussed above as additional training objectives and as evaluation measures during model selection. In particular, we propose a multi-task training (MTT) and a multi-task ranking (MTR) approach that both can be used along with an arbitrary KGE model class and without a substantial increase in computational cost. In our experimental study, the resulting multi-task KGE models had significantly better overall performance for graph-structure prediction tasks and often (but not always) also led to better downstream task performance. We also found that downstream task performance could be further improved by using a smaller set of pre-training tasks. The results suggest that the optimal choice of tasks depends on the dataset, KGE model class, and downstream task and may be difficult to determine in practice. In summary, the contributions of this paper are as follows: (i) We show empirically that commonly trained KGE models fail at basic graph-structure prediction tasks beyond link prediction. (ii) We propose novel multi-task training and ranking approaches that address this shortcoming. (iii) We explore the impact of standard and multi-task training as well as different approaches for model selection on downstream task performance. (iv) We contextualize KGE model performance with results obtained from recent graph neural networks, which-in contrast to KGE models-are trained directly on each downstream task. Although our work makes a step toward improved pre-training of KGE models, it also suggests that more research is needed on the relation between pre-training KGE models and their general suitability for downstream applications.

2. PRELIMINARIES AND RELATED WORK

We briefly describe KGE models, training and evaluation methods for link prediction, as well as prior work on other tasks. A more comprehensive discussion can be found in surveys such as (Nickel et al., 2015; Wang et al., 2017; Ji et al., 2021) .

Link prediction.

A knowledge graph G ⊆ E × R × E is a collection of (subject, predicate, object)triples over a set E of entities and a set R of relations. Triples represent known facts such as (Austin, capitalOf, Texas). In the KGE literature, the link prediction task is the task of inferring the subject or object to questions of form (?, capitalOf, Texas) and (Austin, capitalOf, ?), respectively. KGE models. KGE models (Sun et al., 2019; Trouillon et al., 2016; Bordes et al., 2013) represent each entity and each relation of a KG with a a low-dimensional embedding, commonly a real or complex vector. KGE models have an associated scoring function s : E × R × E → R that associates each triple with a real-valued score. Intuitively, high scores indicate plausible triples, low scores implausible triples. Commonly, the scoring function depends on the input triple only through the embeddings of its arguments. For example, TransE (Bordes et al., 2013 ) is a translation-based model with s(i, k, j) = -e i + r k -e j , where e i ∈ R d and r k ∈ R d denote entity and relation embeddings of dimensionality d > 0, respectively. Scoring functions can be more involved, e.g., based on convolutional neural networks (Dettmers et al., 2018) or transformers (Chen et al., 2021a) . Standard training. KGE models are commonly trained on the link prediction task. We only give a high-level description here. For each triple (s, p, o) in the training data G train , KGE models are trained

