JAKET: JOINT PRE-TRAINING OF KNOWLEDGE GRAPH AND LANGUAGE UNDERSTANDING

Abstract

Knowledge graphs (KGs) contain rich information about world knowledge, entities, and relations. Thus, they can be great supplements to existing pre-trained language models. However, it remains a challenge to efficiently integrate information from KG into language modeling. And the understanding of a knowledge graph requires related context. We propose a novel joint pre-training framework, JAKET, to model both the knowledge graph and language. The knowledge module and language module provide essential information to mutually assist each other: the knowledge module produces embeddings for entities in text while the language module generates context-aware initial embeddings for entities and relations in the graph. Our design enables the pre-trained model to easily adapt to unseen knowledge graphs in new domains. Experimental results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding.

1. INTRODUCTION

Pre-trained language models (PLM) leverage large-scale unlabeled corpora to conduct selfsupervised training. They have achieved remarkable performance in various NLP tasks, exemplified by BERT (Devlin et al., 2018 ), RoBERTa (Liu et al., 2019b ), XLNet (Yang et al., 2019) , and GPT series (Radford et al., 2018; 2019; Brown et al., 2020) . It has been shown that PLMs can effectively characterize linguistic patterns in text and generate high-quality context-aware representations (Liu et al., 2019a) . However, these models struggle to grasp world knowledge about entities and relations (Poerner et al., 2019; Talmor et al., 2019) , which are very important in language understanding. Knowledge graphs (KGs) represent entities and relations in a structural way. They can also solve the sparsity problem in text modeling. For instance, a language model may require tens of instances of the phrase "labrador is a kind of dog" in its training corpus before it implicitly learns this fact. In comparison, a knowledge graph can use two entity nodes "labrador", "dog" and a relation edge "is a" between these nodes to precisely represent this fact. Recently, some efforts have been made to integrate knowledge graphs into PLM. Most of them combine the token representations in PLM with representations of aligned KG entities. The entity embeddings in those methods are either pre-computed based on an external source by a separate model (Zhang et al., 2019; Peters et al., 2019) , which may not be easily aligned with the language representation space, or directly learned as model parameters (Févry et al., 2020; Verga et al., 2020) , which often have an over-parameterization issue due to the large number of entities. Moreover, all the previous works share a common challenge: when the pre-trained model is fine-tuned in a new domain with a previously unseen knowledge graph, it struggles to adapt to the new entities, relations and structure. Therefore, we propose JAKET, a Joint pre-trAining framework for KnowledgE graph and Text. Our framework contains a knowledge module and a language module, which mutually assist each other by providing required information to achieve more effective semantic analysis. The knowledge module leverages a graph attention network (Veličković et al., 2017) to provide structure-aware entity embeddings for language modeling. And the language module produces contextual representations as initial embeddings for KG entities and relations given their descriptive text. Thus, in both modules, content understanding is based on related knowledge and rich context. On one hand, the joint pre-training effectively projects entities/relations and text into a shared semantic latent space,

