ENHANCED TEMPORAL KNOWLEDGE EMBEDDINGS WITH CONTEXTUALIZED LANGUAGE REPRESENTA-TIONS

Abstract

World knowledge exists in both structured (tables, knowledge graphs) and unstructured forms (texts). Recently, there have been extensive research efforts in the integration of structured factual knowledge and unstructured textual knowledge. However, most studies focus on incorporating static factual knowledge into pre-trained language models, while there is less work on enhancing temporal knowledge graph embedding using textual knowledge. Existing integration approaches can not apply to temporal knowledge graphs (tKGs) since they often assume knowledge embedding is time-invariant. In fact, the entity embedding in tKG embedding models usually evolves over time, which poses the challenge of aligning temporally relevant textual information with entities. To this end, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which uses tKG quadruple as an implicit measure to temporally align textual data and the time-evolving entity representations and uses a novel knowledge-text prediction task to inject textual information into temporal knowledge embedding. ECOLA jointly optimizes the knowledge-text prediction objective and the temporal knowledge embedding objective, and thus, can simultaneously take full advantage of textual and structured knowledge. Since existing datasets do not provide tKGs with aligned textual data, we introduce three new datasets for training and evaluating ECOLA. Experimental results on the temporal knowledge graph completion task show that ECOLA outperforms state-of-the-art tKG embedding models by a large margin.

1. INTRODUCTION

Knowledge graphs (KGs) have long been considered an effective and efficient way to store structural knowledge about the world. A knowledge graph consists of a collection of triples ps, p, oq, where s (subject entity) and o (object entity) correspond to nodes and p (predicate) indicates the edge type (relation) between the two entities. Common knowledge graphs (Toutanova et al., 2015; Dettmers et al., 2018) assume that the relations between entities are static connections. However, in the real world, there are not only static facts and properties but also time-evolving relations associated with the entities. For example, the political relationship between two countries might worsen because of trade fights. To this end, temporal knowledge graphs (tKGs) (Tresp et al., 2015) were introduced that capture temporal aspects of relations by extending a triple to a quadruple, which adds a timestamp or time interval to describe when the relation is valid, e.g. (Argentina, deep comprehensive strategic partnership with, China, 2022). Extensive studies have been focusing on learning temporal knowledge embedding (Leblay & Chekol, 2018; Han et al., 2020c) , which not only helps infer missing links in tKGs but also benefits various knowledge-related downstream applications, such as temporal question answering (Saxena et al., 2021b) . However, knowledge graph embedding often suffers from the sparseness of knowledge graphs. For example, the tKG model proposed by Han et al. (2020a) performs much better on the dense tKG than the sparse one. To address this problem, some recent studies incorporate textual information to enrich knowledge embedding. KEPLER (Wang et al., 2021) learns the representation of an entity by encoding the entity description with a pre-trained language model (PLM) and optimizing the knowledge embedding objective. KG-Bert (Yao et al., 2019) 2 3 4 5 6 Give awards to 2021-08-18

takes entity and relation descriptions of

The Meta Foundation launched a research award, which was officially announced on Aug. 18. Ayfer Ozgur is one of the winners. Today is October 24. I didn't expect Meta to acquire Instagram, a fun, popular photo-sharing app for mobile devices. a triple as the input of a PLM and turns knowledge graph completion into a sequence classification problem. However, they do not take the temporal nature and the evolutionary dynamics of knowledge graphs into account. In tKG embedding models, the entity representations usually evolve over time as they involve in different events at different timestamps. Taking financial crises as an example, companies are more likely involved in events such as laying off employees. But when the economy recovers, companies hire staff again rather than cut jobs. Thus, the entities should also be able to drift their representations over time to manage the changes. Therefore, given an entity, it should be taken into account which textual knowledge is relevant to it at which timestamp. We name this challenge as temporal alignment between texts and tKG, which is to establish a correspondence between textual knowledge and their temporal knowledge graph depiction. This is one of the challenges that existing approaches cannot handle due to their limitation of assuming knowledge embedding is static and using the time-invariant description of an entity to enhance its representation. Thus, they are not appropriate in temporal knowledge graph scenarios where temporal alignment is required. The other challenge is that temporal knowledge embedding models learn the entity representations as a function of time, which exposes another limitation of existing approaches that their architectures cannot be naturally combined with tKG models. Therefore, it is not clear how to enhance temporal knowledge embedding with textual data. To this end, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which uses temporally relevant textual knowledge to enhance the timedependent knowledge graph embedding and ensures that the enhanced knowledge embedding preserves the temporal nature. Specifically, we solve the temporal alignment challenge by using tKG quadruples as an implicit measure. We pair a quadruple with its relevant textual data, e.g., event descriptions, which corresponds to the temporal relations between entities at a specific time. Then we use the event description to enhance the representations of entities and predicate involved in the given quadruple. In particular, we encode entities and predicates by tKG embedding models and encode texts using token embedding . Given a quadruple-text pair, we concatenate the embedding of entities, predicate, and textual tokens and feed them into a pre-trained language model. We introduce a novel knowledge-text prediction (KTP) task to inject textual knowledge into temporal knowledge embedding. The KTP task is an extended masked language modeling task, which randomly masks words in texts and entity/predicates in quadruples. With the help of the KTP task, ECOLA would be able to recognize mentions of the subject entity and the object entity and align semantic relationships in the text with the predicate in the quadruple. Thus, the model can take full advantage of the abundant information from the textual data, which is especially helpful for embedding entities and predicates that only appear in a few quadruples. ECOLA jointly optimizes the knowledge-text prediction and temporal knowledge embedding objectives. Since our goal is to develop an approach that can generally improve any potential tKG models, we combine the model with different benchmark tKG embedding models (Goel et al., 2020; Han et al., 2020c; 2021) . For training ECOLA, we need datasets with temporal KG quadruples and aligned textual event descriptions, which is unavailable in existing temporal KG benchmark datasets. Thus, we construct three new temporal knowledge graph datasets by adapting two existing datasets, i.e., GDELT (Leetaru & Schrodt, 2013) and Wiki (Dasgupta et al., 2018) , and an event extraction dataset (Li et al., 2020) . To make a fair comparison with other temporal KG embedding models and keep fast inference, we only take the enhanced



Figure 1: An example of a temporal knowledge graph with textual event descriptions.

