LANGUAGE MODELS ARE OPEN KNOWLEDGE GRAPHS Anonymous authors Paper under double-blind review

Abstract

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pretrained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

1. INTRODUCTION

Knowledge graphs (KGs) are an important resource for both humans and machines. Factual knowledge in KGs is injected into AI applications to imitate important skills possessed by humans, e.g., reasoning and understanding. KG construction is mainly supervised, requiring humans to handwrite every fact, such as Freebase (Bollacker et al., 2008) and Wikidata. KGs can also be constructed in a semi-supervised way, in which a semi-automatic extractor is used to obtain the facts from web corpora (e.g., NELL (Carlson et al., 2010) and Knowledge Vault (Dong et al., 2014) ). Humans however still need to interact with the extractor to improve the quality of the discovered facts. Therefore, human supervision, which is often expensive, is required in constructing KGs. Recent progress in language models (LMs), such as BERT (Devlin et al., 2018) and GPT-2/3 (Radford et al., 2019; Brown et al., 2020) , has led to superior results even outperforming humans in a wide range of tasks, e.g., sentence classification (Wang et al., 2018) , question answering (Brown et al., 2020) . Pre-trained LMs are also capable to write poetry, music, and code, while such tasks often require we human to spend a significant amount of time in learning the relevant knowledge to work well. In fact, these pre-trained LMs automatically acquire factual knowledge from large-scale corpora (e.g., BookCorpus (Zhu et al., 2015) , Common Crawl (Brown et al., 2020)) via pre-training. The learned knowledge in pre-trained LMs is the key to the current success. We therefore consider the following question: instead of using the manually created knowledge, can we use the knowledge stored in pre-trained LMs to construct KGs? In this paper, we design an unsupervised approach called MAMA that successfully recovers the factual knowledge stored in LMs to build KGs from scratch. MAMA constructs a KG with a single forward pass of a pre-trained LM (without fine-tuning) over a textual corpus. As illustrated in Figure 1 , MAMA has two stages: Match and Map. Match stage generates a set of candidate facts by matching the facts in the textual corpus with the knowledge in the pre-trained LM. General or world knowledge from large-scale corpora is embedded in the LM, thus candidate facts in the target corpus are often covered by the knowledge in the LM. The candidate facts are matched through an efficient beam search in the attention weight matrices of the pre-trained LM without fine-tuning. Map stage produces an open KG via mapping the matched candidate facts from Match stage to both fixed KG schema and open schema. If the schema of candidate facts exists in the KG schema, we map the candidate facts directly to the fixed KG schema. Otherwise, we reserve the unmapped candidate

