LANGUAGE MODELS ARE OPEN KNOWLEDGE GRAPHS Anonymous authors Paper under double-blind review

Abstract

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pretrained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs. Our code and KGs will be made publicly available.

1. INTRODUCTION

Knowledge graphs (KGs) are an important resource for both humans and machines. Factual knowledge in KGs is injected into AI applications to imitate important skills possessed by humans, e.g., reasoning and understanding. KG construction is mainly supervised, requiring humans to handwrite every fact, such as Freebase (Bollacker et al., 2008) and Wikidata. KGs can also be constructed in a semi-supervised way, in which a semi-automatic extractor is used to obtain the facts from web corpora (e.g., NELL (Carlson et al., 2010) and Knowledge Vault (Dong et al., 2014) ). Humans however still need to interact with the extractor to improve the quality of the discovered facts. Therefore, human supervision, which is often expensive, is required in constructing KGs. Recent progress in language models (LMs), such as BERT (Devlin et al., 2018) and GPT-2/3 (Radford et al., 2019; Brown et al., 2020) , has led to superior results even outperforming humans in a wide range of tasks, e.g., sentence classification (Wang et al., 2018 ), question answering (Brown et al., 2020) . Pre-trained LMs are also capable to write poetry, music, and code, while such tasks often require we human to spend a significant amount of time in learning the relevant knowledge to work well. In fact, these pre-trained LMs automatically acquire factual knowledge from large-scale corpora (e.g., BookCorpus (Zhu et al., 2015 ), Common Crawl (Brown et al., 2020) ) via pre-training. The learned knowledge in pre-trained LMs is the key to the current success. We therefore consider the following question: instead of using the manually created knowledge, can we use the knowledge stored in pre-trained LMs to construct KGs? In this paper, we design an unsupervised approach called MAMA that successfully recovers the factual knowledge stored in LMs to build KGs from scratch. MAMA constructs a KG with a single forward pass of a pre-trained LM (without fine-tuning) over a textual corpus. As illustrated in Figure 1 , MAMA has two stages: Match and Map. Match stage generates a set of candidate facts by matching the facts in the textual corpus with the knowledge in the pre-trained LM. General or world knowledge from large-scale corpora is embedded in the LM, thus candidate facts in the target corpus are often covered by the knowledge in the LM. The candidate facts are matched through an efficient beam search in the attention weight matrices of the pre-trained LM without fine-tuning. Our contributions are as follows: 1. We show how to construct KGs from pre-trained LMs. The KGs are constructed with a single forward pass of the pre-trained LMs without fine-tuning over the textual corpora. This helps researchers explicitly understand what the language models learn, bridging the deep LM and KG communities through enhanced model transparency. 2. We propose an unsupervised two-stage approach, MAMA, to first match the candidate facts in the corpora with the knowledge stored in LMs, then map the matched candidate facts to both fixed and open schema to produce a KG. 

2. MAMA

We introduce an unsupervised end-to-end approach Match and Map (MAMA) as illustrated in Figure 1 to construct open knowledge graphs (KGs) from language models (LMs). MAMA constructs the KGs with a single forward pass of the pre-trained LMs (without fine-tuning) over the corpora. 



3. We generate a new type of KG, namely open KG, consists of mapped facts in the fixed KG schema of existing KGs (Wikidata and TAC KBP) annotated by humans; and unmapped facts in the open schema that are new in the reference KG schema. The reach of this result is broad and has downstream utility for knowledge graph construction, deep neural network interpretation, and information extraction.

Map stage produces an open KG via mapping the matched candidate facts from Match stage to both fixed KG schema and open schema. If the schema of candidate facts exists in the KG schema, we map the candidate facts directly to the fixed KG schema. Otherwise, we reserve the unmapped candidate MAMA (1) generates a set of candidate facts via matching the knowledge in the pretrained LM with facts in the textual corpus, e.g., a candidate fact (Dylan, is, songwriter) from the sentence "Dylan is a songwriter.", and (2) produces an open KG by mapping the matched candidate facts to both an existing KG schema, e.g., (Bob Dylan.Q392, occupation.P106, Songwriter.Q753110) in Wikidata schema, and an open schema, e.g., (Bob Dylan.Q392, sign, Albert Grossman.Q708584). facts in the open schema. This results in a new type of KG, open KG, with a mixture of mapped facts in fixed KG schema and unmapped facts in the open schema.

The two stages of MAMA are: Match generates a set of candidate facts from a textual corpus. LMs contain global or world knowledge learned from large-scale corpora, which often does not perfectly match the knowledge in the target corpus. The goal of this stage is to match the knowledge stored in pre-trained LMs with facts in the corpus. Each fact is represented as a triplet (head, relation, tail) 1 , in short, (h, r, t), and passed to Map stage. Match procedure is detailed in Sec. 2.1.Map produces an open KG using the matched candidate facts from Match stage. The constructed open KG has two portions: (a) mapped candidate facts that are in a fixed KG schema, e.g., (Dylan, is, songwriter) is mapped to (Bob Dylan.Q392, occupation.P106, Songwriter.Q753110) according to Wikidata schema; and (b) unmapped candidate facts that are in an open schema, e.g., a candidate fact (Dylan, signed, Albert Grossman) is partially mapped to (Bob Dylan.Q392, sign, Albert Grossman.Q708584) in the open schema. This stage is described in Sec. 2.2.

