LANGUAGE MODELS ARE OPEN KNOWLEDGE GRAPHS Anonymous authors Paper under double-blind review

Abstract

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision. Popular KGs (e.g, Wikidata, NELL) are built in either a supervised or semi-supervised manner, requiring humans to create knowledge. Recent deep language models automatically acquire knowledge from large-scale corpora via pre-training. The stored knowledge has enabled the language models to improve downstream NLP tasks, e.g., answering questions, and writing code and articles. In this paper, we propose an unsupervised method to cast the knowledge contained within language models into KGs. We show that KGs are constructed with a single forward pass of the pretrained language models (without fine-tuning) over the corpora. We demonstrate the quality of the constructed KGs by comparing to two KGs (Wikidata, TAC KBP) created by humans. Our KGs also provide open factual knowledge that is new in the existing KGs.

1. INTRODUCTION

Knowledge graphs (KGs) are an important resource for both humans and machines. Factual knowledge in KGs is injected into AI applications to imitate important skills possessed by humans, e.g., reasoning and understanding. KG construction is mainly supervised, requiring humans to handwrite every fact, such as Freebase (Bollacker et al., 2008) and Wikidata. KGs can also be constructed in a semi-supervised way, in which a semi-automatic extractor is used to obtain the facts from web corpora (e.g., NELL (Carlson et al., 2010) and Knowledge Vault (Dong et al., 2014) ). Humans however still need to interact with the extractor to improve the quality of the discovered facts. Therefore, human supervision, which is often expensive, is required in constructing KGs. Recent progress in language models (LMs), such as BERT (Devlin et al., 2018) and GPT-2/3 (Radford et al., 2019; Brown et al., 2020) , has led to superior results even outperforming humans in a wide range of tasks, e.g., sentence classification (Wang et al., 2018) , question answering (Brown et al., 2020) . Pre-trained LMs are also capable to write poetry, music, and code, while such tasks often require we human to spend a significant amount of time in learning the relevant knowledge to work well. In fact, these pre-trained LMs automatically acquire factual knowledge from large-scale corpora (e.g., BookCorpus (Zhu et al., 2015) , Common Crawl (Brown et al., 2020) ) via pre-training. The learned knowledge in pre-trained LMs is the key to the current success. We therefore consider the following question: instead of using the manually created knowledge, can we use the knowledge stored in pre-trained LMs to construct KGs? In this paper, we design an unsupervised approach called MAMA that successfully recovers the factual knowledge stored in LMs to build KGs from scratch. MAMA constructs a KG with a single forward pass of a pre-trained LM (without fine-tuning) over a textual corpus. As illustrated in Figure 1 , MAMA has two stages: Match and Map. Match stage generates a set of candidate facts by matching the facts in the textual corpus with the knowledge in the pre-trained LM. General or world knowledge from large-scale corpora is embedded in the LM, thus candidate facts in the target corpus are often covered by the knowledge in the LM. The candidate facts are matched through an efficient beam search in the attention weight matrices of the pre-trained LM without fine-tuning. Map stage produces an open KG via mapping the matched candidate facts from Match stage to both fixed KG schema and open schema. If the schema of candidate facts exists in the KG schema, we map the candidate facts directly to the fixed KG schema. Otherwise, we reserve the unmapped candidate with a single forward pass of the pre-trained language model (LM) (without fine-tuning) over the corpus. Given the input: a textual corpus containing passages and sentences, e.g., English Wikipedia, and a pre-trained LM, e.g., BERT, GPT-2/3, MAMA (1) generates a set of candidate facts via matching the knowledge in the pretrained LM with facts in the textual corpus, e.g., a candidate fact (Dylan, is, songwriter) from the sentence "Dylan is a songwriter.", and ( 2) produces an open KG by mapping the matched candidate facts to both an existing KG schema, e.g., (Bob Dylan.Q392, occupation.P106, Songwriter.Q753110) in Wikidata schema, and an open schema, e.g., (Bob Dylan.Q392, sign, Albert Grossman.Q708584). facts in the open schema. This results in a new type of KG, open KG, with a mixture of mapped facts in fixed KG schema and unmapped facts in the open schema. Our contributions are as follows: 1. We show how to construct KGs from pre-trained LMs. The KGs are constructed with a single forward pass of the pre-trained LMs without fine-tuning over the textual corpora. This helps researchers explicitly understand what the language models learn, bridging the deep LM and KG communities through enhanced model transparency. 2. We propose an unsupervised two-stage approach, MAMA, to first match the candidate facts in the corpora with the knowledge stored in LMs, then map the matched candidate facts to both fixed and open schema to produce a KG. 3. We generate a new type of KG, namely open KG, consists of mapped facts in the fixed KG schema of existing KGs (Wikidata and TAC KBP) annotated by humans; and unmapped facts in the open schema that are new in the reference KG schema. The reach of this result is broad and has downstream utility for knowledge graph construction, deep neural network interpretation, and information extraction.

2. MAMA

We introduce an unsupervised end-to-end approach Match and Map (MAMA) as illustrated in Figure 1 to construct open knowledge graphs (KGs) from language models (LMs). MAMA constructs the KGs with a single forward pass of the pre-trained LMs (without fine-tuning) over the corpora. The two stages of MAMA are: Match generates a set of candidate facts from a textual corpus. LMs contain global or world knowledge learned from large-scale corpora, which often does not perfectly match the knowledge in the target corpus. The goal of this stage is to match the knowledge stored in pre-trained LMs with facts in the corpus. Each fact is represented as a triplet (head, relation, tail) 1 , in short, (h, r, t), and passed to Map stage. Match procedure is detailed in Sec. 2.1. Map produces an open KG using the matched candidate facts from Match stage. The constructed open KG has two portions: (a) mapped candidate facts that are in a fixed KG schema, e.g., (Dylan, is, songwriter) (b) Attention matrix for matching degree calculation. Figure 2 : Illustration of Match stage. The upper part of (a) represents the general matching steps of generating the best matched candidate fact (Dylan, is, songwriter) from the sentence "Dylan is a songwriter." The lower portion shows the corresponding step-by-step process. Given a head-tail pair (Dylan, songwriter), at each step, the search chooses one of the actions, i.e., START, YIELD, STOP to produce an intermediate candidate fact. The search starts by adding the head "Dylan" as an initial candidate (step 0). The matching degree of the candidate is initialized as 0. Next, a new candidate is yielded if the candidate has not reached the tail "songwriter" (step 1 and step 2), by appending the next largest attended token (with the largest score from the attention matrix (b) of the sentence) to the end of the current candidate, and the corresponding matching degrees are increased by the associated attention scores (0.3 and 0.4) to 0.3 (0+0.3) and 0.7 (0.3+0.4) respectively. Otherwise, the search stops, and the candidate fact with the best matching degree is returned for the head-tail pair (step 3). The attention matrix (b) is from the forward pass of the LM without fine-tuning over the sentence. " x" marks the tokens to prevent searching backward.

2.1. MATCH

We frame the matching procedure as a search problem. To obtain the best matched candidate facts of an input sentence, the candidates with the top matching degrees are returned from a search process. The matching degree is derived from the search in the attention weight matrices of the pre-trained LM, since the attention weight matrices are one of the main containers of the knowledge in the pre-trained LM. The attention weight matrices are simply from the forward pass of the LM without fine-tuning over the sentence.

2.1.1. BEAM SEARCH

We design a simple-yet-effective beam search to find the best matched candidate facts. For every head-tail pair (h, t) in a sentence, the search maintains the k-best matched candidate facts of the pair. Let's first consider the search from left to right with beam size equals to 1. An example search process is shown in Figure 2 . Given a head-tail pair (Dylan, songwriter), at each step, the search performs one of the following actions: START the search from the head. The head h is added as an initial candidate into the beam. For simplicity, we use START(h) to denote the action, which returns a candidate (h,. In Figure 2(a), at step 0, the head "Dylan" is added as (Dylan, into the beam. The matching degree is initialized to 0. YIELD a new intermediate candidate in the beam if the current candidate has not reached the tail. The next largest attended token (with the largest score from the attention matrix) is appended to the end of the current candidate to yield the new candidate. The corresponding matching degrees are increased by the associated attention scores. At step 1 (orange arrow in Figure 2 (a)), "is" is appended to the current candidate to yield (Dylan, is, , since "is" has the largest attention score with "Dylan" in the attention matrix. The attention score is 0.3 as highlighted in orange in Figure 2 (b). The matching degree becomes 0.3 (i.e. 0+0.3). The multi-head attention is reduced to a single head so that every two tokens of the sentence are associated with one attention weight. We experiment with different reduction Under review as a conference paper at ICLR 2021 Algorithm 1 Beam search for matching candidate facts. . Maintain k-best candidates in the beam 12: end while 13: return T (h,t) setups in Sec. A.3. " x" marks the tokens (prior to the current token) that are not considered in the search to prevent searching backward. Step 2 similarly takes YIELD action to produce (Dylan, is songwriter, . The matching degree is now 0.7 (i.e. 0.3+0.4). We use YIELD(c, s, A s ) to denote the action, where c is a current candidate, s represents the sentence, and A s is the attention matrix from the forward pass of the pre-trained LM over s, which yields a new candidate. STOP the search step if the candidate has reached the tail, then add the candidate as a valid candidate fact into the beam. As beam size equals to 1, (Dylan, is, songwriter) is the only returned candidate fact for the given pair. The final matching degree of the candidate is 0.7. We denote this step using STOP(c, t), which returns a valid fact. The details of the proposed beam search are in Algorithm 1. The inputs of the search algorithm are a head-tail pair (h, t), a sentence s, an attention matrix A s of s. Both h and t are identified by the noun chunk in s. A s is the attention matrix associated with s from the forward pass of LM without fine-tuning. The search gets started by adding the head h as the initial candidate in the beam (line 1). While there are still new candidates waiting to be yielded (line 2), the search continues, and the top k candidates sorted by the matching degrees are maintained (line 3-11) in the beam. In practice, we implement an action manager O to decide which action to take at each step. Given a candidate c in the beam, O(c) = START always happens at the beginning of the search. If c has not reached the tail t yet, O(c) = YIELD. Otherwise, O(c) = STOP. We convert the subwords to the corresponding full words. We also notice some facts are in reverse order in the sentence, e.g., "• • • said Jason Forcier , a vice president at battery maker A123 Systems Inc." for facts of relation "org:top members employees", thus enable bidirectionality by running the algorithm in both directions (left to right and right to left). The beam search is implemented by the breadth-first search, which is efficient as the time complexity is O(k • d), where d is the maximum depth of the search tree.

2.1.2. FILTER

Although the basic functionality provided by beam search is sufficient for finding useful candidate facts, we have found a few constraints useful. Given a candidate fact (h, r, t) from beam search result T (h,t) , it remains as a fact if satisfying all the following constraints. Constraint #1 The matching degree of (h, r, t) is above a threshold. We compare the matching degrees corpus-wide to only reserve the facts that are matched better with the knowledge in LMs. For example, MAMA extracts a fact (Rolling Stone, wrote, pop song) from "Rolling Stone wrote: "No other pop song has so thoroughly challenged artistic conventions"", which is not an accurate fact based on the sentence. We observe the associated matching degree is below a proper threshold, while the matching degrees of high-quality facts from the same documents, e.g., (Dylan, is, songwriter), or confident facts from the other documents are beyond the threshold.

Constraint #2

The distinct frequency of r is above a threshold. To avoid facts to be over-specified, e.g., (Dylan, signed to Sam Peckinpah's film, Pat Garrett and Billy the Kid), we require r should take many distinct head-tail pairs in the corpus. Constraint #3 Relation r is a contiguous sequence in the sentence. We can avoid r that has no meaningful interpretation (Fader et al., 2011) , e.g., (Rolling Stone, wrote challenged, conventions) from the above sentence. 

2.2.1. MAPPED FACTS IN KG SCHEMA

The goal is to map a candidate fact (h, r, t) to a fact (h k , r k , t k ) in the KG schema. The reason for mapping to an existing KG schema is to make use of the high-quality schema designed by experts (to avoid duplicated efforts of building from scratch) and enable evaluating the candidate facts with oracle KG facts contributed by human volunteers. We first map both entities h, t to h k , t k , then map the relation r to r k in the reference KG schema. Additional details are presented in Sec. A.1.

Entity linking to KG schema

We adapt an unsupervised entity linker based on a mention-to-entity dictionary (Spitkovsky & Chang, 2012) to link the entities for scalability consideration. Besides, contextual information is crucial to link the entities correctly, we use the word embedding of the context to disambiguate the entities, which means we only link the entities with high contextual similarities based on the word embedding. We adopt the entity linker to map h, t to h k , t k . Relation mapping with KG schema We largely follow the relation mapping method proposed by Angeli et al. (2015) to construct an offline relation map between KG relation and relation phrases of the candidate facts. The basic idea is that the more often linked head-tail pairs (i.e., entities are with type information from the entity linking step) co-occur between the candidate facts and KG facts, the more likely the corresponding relations are mapped to each other. additional open schema to improve the coverage of the KGs, that benefits the downstream KG based applications, e.g., QA and commonsense reasoning (Wang et al., 2019; Brown et al., 2020) . Under review as a conference paper at ICLR 2021 

3. EXPERIMENTS

How well can language models generate knowledge graphs? We experimentally explore how well can MAMA answer the question in the section. 

3.1. RESULTS ON MAPPED FACTS

We first study the quality of the mapped facts. As the candidate facts have been mapped to the schema of oracle KGs, we are able to quantitively compare the candidate facts with the oracle facts in the reference KGs.

3.1.1. DATASETS

We compare the mapped facts from MAMA with the facts in two KGs: TAC KBP TAC Knowledge Base Population (KBP) Slot Filling is a task to search a document collection to fill in the tail/object entity for predefined relations (slots) for a given head/subject entity in a reference KG. We experiment with the reference KG in the 2013 challenge. We use the document collection and oracle facts of the 2013 task. The statistic of the dataset is shown in Table 1 . Wikidata We use popular Wikidata as another KG. We use all the oracle facts in Wikidata. We use the English Wikipedia as the text corpus, since a large amount of facts in Wikidata is from English Wikipedia. The statistic is in Table 1 . To evaluate the mapped facts, we first use Match stage of MAMA to run over the corresponding documents to generate the candidate facts. Then Map stage is leveraged to map the candidate facts to the schema of TAC KBP and Wikidata respectively. The parameter settings, such as beam size in Algorithm 1, are shared across TAC KBP and Wikidata based on the parameter study in Sec. A.3.

3.1.2. TAC KBP

To verify the ability to produce correct facts, we compare candidate facts from MAMA to the outputs of two open information systems, which also produce triplets in the form of (h, r, t). After collecting the triplets from the corresponding system, we use the same Map procedure with MAMA (Sec. 2.2.1) to map the triplets to the corresponding KG schema. Stanford OpenIE leverages POS tag and dependency parser, and generates self-contained clauses from long sentences to extract the triplets, which is the best open information extraction system (Angeli et al., 2015) on TAC KBP (Surdeanu, 2013) . OpenIE 5.1foot_1 is one of the state-of-the-art open information extraction systems, which is the successor to Ollie (Schmitz et al., 2012) , and it improves extractions from noun relations, numerical sentences, and conjunctive sentences depending on the linguistic patterns. We use two families of pre-trained LMs with MAMA. Larger/deeper LMs produce KGs of higher quality. BERT LARGE outperforms BERT BASE since the doubling parameter size. GPT-2s share similar trends, where we observe performance increases when the model size increases. This complies with our intuition on more knowledge is stored in deeper and larger models. Such increases in performance seem subtle on TAC KBP, we find this might due to the relatively small number of oracle facts by noticing a more significant improvement on Wikidata in Sec. 3.1.3. We plan to further improve the results with larger pre-trained LMs, e.g., GPT-3 (Brown et al., 2020) , Megatron-LM (Shoeybi et al., 2019) . BERT LMs outperform GPT-2 LMs under similar model sizes. More specifically, BERT BASE performs better than MAMA-GPT-2 in F1, and MAMA-BERT LARGE outperforms MAMA-GPT-2 MEDIUM in F1. BERT BASE and MAMA-GPT-2 are similar in size, while MAMA-BERT LARGE and MAMA-GPT-2 MEDIUM are similar in model size as well. This is mainly because that the recall of BERT LMs is higher than that of corresponding GPT-2 LMs. The result indicates that the Cloze-style loss function (i.e., masked language model) of BERT is more effective and flexible in recovering more knowledge than the autoregressive LM objective. We also notice that the precision of GPT-2 LMs is higher than that of according BERT LMs. (Raffel et al., 2019; Brown et al., 2020) . Similar to the observations on TAC KBP, the precision is higher compared to recall. Wikidata is not fully built from Wikipedia, MAMA could improve the recall by running on those larger corpora to collect more facts.

3.2. ANALYSIS OF UNMAPPED FACTS

The open KG constructed by MAMA is a new type of KG combining the fixed KG schema with the flexible open schema. We turn to study the quality of the candidate facts that are not mapped to the above reference KG schema, but are in the open schema generated by MAMA. We manually judge such unmapped facts generated by our best method MAMA-GPT-2 XL from 100 sampled documents in Wikidata and TAC KBP respectively. The quality of unmapped facts is verified by human annotators. We find 35.3% of the unmapped facts are true on Wikidata. We find 83.2% of those true facts are partially unmapped facts as defined in Sec. (Fader et al., 2011) . Both entity linking and relation mapping of Map stage rely heavily on the accuracy of entity detection from the spaCy noun chunk. We conclude that the main root cause of the untrue unmapped facts is due to the errors made by the spaCy noun chunk. We observe similar trends on TAC KBP. We plan to leverage crowdsourcing platforms, e.g., Mechanical Turk, to conduct quantitative evaluations over the unmapped facts to better understand the strengths and shortage of MAMA. We plan to identify more accurate entities by relying on attention weights in LMs (Clark et al., 2019; Hewitt & Manning, 2019) instead of using extra resources. We will also investigate stronger entity linkers (Kolitsas et al., 2018) and learn a more robust relation mapping through weak or distant supervision (Mintz et al., 2009; Ratner et al., 2017) . We will investigate more sophisticated approaches, such as graph neural networks (Kipf & Welling, 2016) , to generate more accurate relation phrases from the attention weight matrices by considering structural information.

4. RELATED WORK

Knowledge graph construction can be generally categorized into two groups, 1) supervised approaches. Wikidata, Freebase (Bollacker et al., 2008) , YAGO (Suchanek et al., 2007) , YAGO2 (Hof-fart et al., 2013) , DBpedia (Auer et al., 2007) are built based on human supervision from Wikipedia infoboxes and other structured data sources; 2) semi-supervised approaches. Open information extraction systems, e.g., OLLIE (Schmitz et al., 2012) , Reverb (Fader et al., 2011) , Stanford Ope-nIE (Angeli et al., 2015) , and OpenIE 5.1 2 aim to leverage carefully-designed patterns based on linguistic features (e.g., dependencies and POS tags), to extract triplets from web corpora for open schema KG. Besides, NELL (Carlson et al., 2010) , DeepDive (Niu et al., 2012) , Knowledge Vault (Dong et al., 2014) extract information based on a fixed schema or ontology, where humans help improve the accuracy of the extractions. Probase (Wu et al., 2012) produces taxonomies instead of rich typed relations in general KGs. MAMA instead uses learned knowledge stored in pre-trained LMs without human supervision to construct an open KG, which is a mixture of fixed schema and open schema. Different from commonsense knowledge construction using Transformers (Davison et al., 2019; Bosselut et al., 2019) , the proposed method is unsupervised and end-to-end, and constructs general-purpose KGs instead of commonsense knowledge. Language models, e.g., BERT (Devlin et al., 2018) , GPT (Radford et al., 2018) , GPT-2/3 (Radford et al., 2019; Brown et al., 2020 ), ELMo (Peters et al., 2018) , Transformer-XL (Dai et al., 2019) , ALBERT (Lan et al., 2019) , RoBERTa (Liu et al., 2019) , XLNet (Yang et al., 2019) and Megatron-LM (Shoeybi et al., 2019) contain factual knowledge obtained via pre-training on large-scale corpora such as Wikipedia and BookCorpus (Zhu et al., 2015) . Studies have leveraged the pre-trained LMs as virtual KGs, and show reasonable performance in QA tasks (Dhingra et al., 2020; Guu et al., 2020) , and language modeling (Khandelwal et al., 2019) . LMs are further enhanced by KGs (Peters et al., 2019) to improve knowledge-driven tasks. While the existing work utilizes knowledge in an implicit way, the main difference is that our approach explicitly extracts knowledgeable facts from the LMs. Compare to the joint training with knowledge base to improve shallow word embedding (Wang et al., 2014) , we show that the knowledge is already stored in the deep LMs. We plan to incorporate domain knowledge into language models to construct domain-specific KGs. The main difference between LAMA (Petroni et al., 2019; 2020) and MAMA is mainly two-fold: (1) LAMA aims to complete Cloze-style statement, e.g., given "Dylan is a ", LAMA predicts which words/phrases should fill the blank " ", which has no direct connection to KGs. There are several fundamental limitations when adapting LAMA to construct KGs, e.g., additional queries must be constructed first, and the answers for the queries must be linked to KGs. MAMA aims to solve a reasoning problem, e.g., given a passage, MAMA directly matches the fact in the form of a triplet (Dylan, is, songwriter) at the first step, then maps the fact to produce a KG. (2) The benchmark datasets used with MAMA are larger compared to the LAMA benchmark, e.g., Wikidata is 3 orders of magnitude larger compared to the largest dataset in the LAMA benchmark. Neural network interpretation here specifically refers to pre-trained deep language model analysis. There has been a lot of work to understand what the neural networks learn (Linzen et al., 2016; Adi et al., 2016; Tenney et al., 2019) . With regards to analyzing Transformer (Vaswani et al., 2017) based language models (e.g., BERT and GPT-3), substantial recent work focuses on both visualizing and analyzing the attention (Vig, 2019; Jain & Wallace, 2019; Clark et al., 2019; Michel et al., 2019; Vig et al., 2020; Ramsauer et al., 2020; Hendrycks et al., 2020) . Instead of analyzing or visualizing, we use LMs to generate structured KGs to directly recover what LMs learn from the corpora.

5. CONCLUSION

We show that the knowledge graphs can be constructed by a single forward pass of the language models over textual corpora. We propose a two-stage unsupervised approach MAMA to first match the facts in the corpus with the internal knowledge of the language model, and then map the matched facts to produce a knowledge graph. We demonstrate the quality of the resultant open knowledge graphs by comparing to two knowledge graphs (Wikidata and TAC KBP). The open knowledge graph also features new facts in the open schema, which could have broad implications for knowledge graphs and their downstream applications. The results also suggest that larger language models store richer knowledge than existing knowledge graphs, and generating on even larger high-quality text corpora could continue improving knowledge graphs. Additionally, the knowledge graphs generated by our approach can help researchers to look into what the language models learn, so our interpretable knowledge graphs establish a bridge between the deep learning and knowledge graph communities.

A ADDITIONAL DETAILS AND ANALYSIS OF MAMA A.1 METHOD DETAILS

Map stage details To evaluate the mapped facts, we first use Match stage of MAMA to run over the corresponding documents to generate the candidate facts. For Map stage on TAC KBP, we link to the oracle annotation of the entities or spans in the TAC KBP corpus. On Wikidata, the entity linking method described in Sec. 2.2.1 is first leveraged to link entities in the candidate facts to Wikipedia anchors. We build an enhanced mention-to-entity dictionary based on Spitkovsky & Chang (2012). In particular, we add new Wikipedia anchors to the dictionary which results in 26 million entries comparing to 21 million entries in Spitkovsky & Chang (2012) . Then a Wikipedia anchor to the Wikidata item dictionary is constructed and used to further link the entities to Wikidata. If the head or tail is a pronoun, we further use neuralcoref 4 for coreference resolution. We use GloVe (Pennington et al., 2014) embedding for disambiguation. The relation mapping is constructed offline for TAC KBP and Wikidata respectively using the method in Sec. 2.2.1. Besides the automatic relation mapping method proposed in Angeli et al. (2015) , we manually check whether the top relation phrases are true as described in Sec. 2.2.1. For relation mapping, we randomly sampled a hold-out dataset including 2,000 documents from the TAC KBP corpus and English Wikipedia for the relation mapping construction on TAC KBP and Wikidata respectively. For oracle facts in Wikidata, we only preserve those facts describing relations between entities that could be linked to corresponding Wikipedia anchors. We rule out facts of attributes about entities and facts of auxiliary relations (such as topic's main category.P901) and finally results in 27,368,562 oracle facts. Implementation details For Wikidata, at Match stage, we randomly split the English Wikipedia data into 20 partitions, and map the data partitions to 20 distributed servers to run. Each server is configured with four Tesla K80 12Gs. We set the max sequence length to 256, and batch size as 32 for MAMA-BERT LARGE and 4 for MAMA-GPT-2 XL . We use implementations of pre-trained LMs in Transformers package 5 . We use spaCy sentencizer 6 to segment the documents into sentences. Parameter settings The parameter settings are shared across TAC KBP and Wikidata. All the choices are based on the parameter study in Sec. A.3. The beam size of Algorithm 1 is set to 6. The matching degree threshold of Constraint #1 (Sec. 2.1.2) is set to 0.005, and the number of distinct head-tail pairs of Constraint #2 (Sec. 2.1.2) is set to 10. To generate the attention weight matrix A s of a sentence, we reduce the weights of every attention head in the last layer of pre-trained LMs using the mean operator.

A.2 ERROR ANALYSIS

There is still significant room to improve MAMA. To further understand the shortage of MAMA, we conduct an error analysis of the errors in precision (i.e., incorrect facts returned by MAMA) of Table 2 and Table 3 . We choose our best method MAMA-GPT-2 XL for the study. We sample 100 documents from the Wikidata dataset, and manually check the reasons for the errors. We find 33.1% of the errors are caused by incorrect entities, while the relation phrases are correct. The errors are due to the incorrect noun chunk detected by the spaCy 7 . 18.3% of the errors are due to the missing relation mapping created in Sec. 2.2.1. Note that we find approximately 23.8% of the errors are actually correct facts that are new in the reference KGs. e.g., (Bob Dylan.Q392, residence.P551, Nashville.Q23197) (in Figure 16 ) is not an existing fact in Wikidata, but it is a correct mapped fact based on our annotation. The rest errors made by MAMA-GPT-2 XL are incorrect relation phrases, Under review as a conference paper at ICLR 2021 such as uninformative relation phrases. We find similar errors are made by MAMA-GPT-2 XL on TAC KBP. Similar to Sec. 3.2, enhancing the entity detection, entity linker, relation mapping, and relation generation are helpful. We also plan to leverage lifelong learning (Carlson et al., 2010) to add true facts to the reference KGs to improve the evaluation.

A.3 PARAMETER STUDY

We study the effects of the parameters using MAMA-BERT BASE on TAC KBP. We randomly sample 20% of the oracle query entities as a hold-out dataset to tune the parameters, and use the best parameter setting achieved for both TAC KBP and Wikidata experiments. When studying the effect of a certain parameter, we keep the remaining parameters as default described in Sec. A.1. We use F1 to measure the effects. Effects of beam size Figure 3 (a) illustrates the effects of various beam sizes in Algorithm 1. We find that in general, the larger the beam size is, the better F1 the setting achieves. This is because that MAMA is able to reserve more potentially correct facts when more candidates are allowed in the Match stage. However, F1 improvement gradually becomes subtle, while the computation costs increase more significantly. For sake of the efficiency, we do not explore larger beam sizes. We set the beam size as 6. Effects of search constraints Figure 3(b) compares the effect of different thresholds of the matching degree of Constraint #1 in Sec. 2.1.2. We set the threshold as 0.005 since it achieves the best result. Note that the summed attention score is normalized by the length of the fact to penalize the cumbersome facts. The matching degree threshold is effective, which is mainly because of the knowledge contained in the self-attention matrix. The score in the attention matrix is representing the chance of the facts to be the true facts based on the stored knowledge. Figure 3 (c) shows the impact of the number of distinct head-tail pairs in identifying common relations of Constraint #2 in Sec. 2.1.2. The best result is achieved when it equals 10. This shows that while MAMA mostly identifies frequent relations, it is also able to capture some rare relations for the open schema. 3(d) shows the comparison between attention weights of the last layer and the mean of all layers. The attention weights of the last layer perform better. This is due to the attention weights in lower layers are low-level linguistic knowledge according to (Clark et al., 2019; Ramsauer et al., 2020) , which are less relevant to the factual knowledge for the KG construction. Figure 3 (e) compares the impact of different attention reduction, i.e., mean, max, over the attention heads of the last layer. We find the "mean" perform better. The reason is that the token often intensively attends to several specific tokens in the sequence (Michel et al., 2019) , and the "mean" operator is more sensitive to such information.

Effects of attention weights

Figure

B SAMPLES FROM MAMA ON TAC KBP B.1 MAPPED FACTS

We randomly sample 100 documents from TAC KBP corpus, then randomly sample sentences from those documents. The uncurated candidate facts and the corresponding mapped facts of the sampled sentences based on our best methods MAMA-BERT LARGE and MAMA-GPT-2 XL are shown in Figure 4 and Figure 5 respectively. We also randomly sample several sentences in which MAMA-BERT LARGE differs from MAMA-GPT-2 XL in the resulting facts for comparison, which is illustrated in Figure 6 . In each table, "ID" represents the document ID of a sampled sentence in TAC KBP corpus. "Sentence" indicates the sampled sentence. "Candidate facts to mapped facts" column contains the candidate facts (on the left side of "!") and their corresponding mapped facts (on the right side of "!").

B.2 UNMAPPED FACTS

We randomly sample 100 documents from TAC KBP corpus. From those documents, we show unmapped facts from the sampled sentences from those documents. We manually check the correctness of the unmapped facts according to Sec. 3.2, and show the correct ones. The original candidate facts with the corresponding unmapped facts of the sampled sentences generated by MAMA-BERT LARGE and MAMA-GPT-2 XL are shown in Figure 7 and Figure 8 . A further comparison of the unmapped candidate facts is illustrated in Figure 9 . In each table, "ID" represents the document ID of a sampled sentence in TAC KBP corpus. "Sentence" indicates the sampled sentence. "Candidate facts to unmapped facts" column contains the candidate facts (on the left side of "!") and their corresponding unmapped facts (on the right side of "!"). C SAMPLES FROM MAMA ON WIKIDATA C.1 MAPPED FACTS Similar to TAC KBP, we randomly sample 100 documents from the Wikidata corpus (i.e., English Wikipedia), then randomly sample sentences from those documents. The uncurated candidate facts and the corresponding mapped facts of the sampled sentences based on our best methods MAMA-BERT LARGE and MAMA-GPT-2 XL are shown in Figure 10 and Figure 11 respectively. We also randomly sample several sentences in which MAMA-BERT LARGE differs from MAMA-GPT-2 XL in the resulting facts for comparison, which is illustrated in Figure 12 . In each table, "ID" represents the Wikipedia page's title of a sampled sentence. "Sentence" indicates the sampled sentence. "Candidate facts to mapped facts" column contains the candidate facts (on the left side of "!") and their corresponding mapped facts (on the right side of "!").

C.2 UNMAPPED FACTS

Similar to TAC KBP, we randomly sample 100 documents from the Wikidata corpus. From those documents, we show unmapped facts from several sampled sentences from those documents. We manually check the correctness of the unmapped facts according to Sec. 3.2, and show the correct ones. The original candidate facts with the corresponding unmapped facts of the sampled sentences generated by MAMA-BERT LARGE and MAMA-GPT-2 XL are shown in Figure 13 and Figure 14 . A further comparison of the unmapped candidate facts is illustrated in Figure 15 . In each table, "ID" represents the Wikipedia page's title of a sampled sentence. "Sentence" indicates the sampled sentence. "Candidate facts to unmapped facts" column contains the candidate facts (on the left side of "!") and their corresponding unmapped facts (on the right side of "!").

D ADDITIONAL OPEN KG SUBGRAPHS FROM MAMA ON WIKIDATA

We sample several documents from the Wikidata corpus. We visualize the mapped facts and unmapped facts from those documents as examples of subgraphs in the resulting open KGs. We show the snapshots of the subgraphs generated by MAMA-BERT LARGE from Figure 16 to Figure 24 . We Under review as a conference paper at ICLR 2021 

SF13 ENG

The message also featured several appearances of Adam Gadahn, also known as Azzam al-Amriki, an American who grew up in Riverside County and converted to Islam and joined al-Qaida. (Adam Gadahn, also, Azzam Al-Amriki) ! (Adam Gadahn, per:alternate names, Azzam Al-Amriki); (Adam Gadahn, ,, Azzam Al-Amriki) ! (Adam Gadahn, ,, Azzam Al-Amriki) (He, is, English) ! (jonathan culler.Q933332, languages spoken, written or signed.P1412, english language.Q1860) -Jordi Cañas Pérez In the regional elections of Catalonia in November 2010 , he was chosen in the primary as number three for Ciutadans ' candidacy in the province of Barcelona . (The Province, of, Barcelona) ! (jordi cañas pérez.Q557693, member of political party.P102, citizens (spanish political party). -(He, moved on to, Everton) ! (neville southall.Q436650, member of sports team.P54, everton f.c..Q5794) Olga Semenova Tyan-Shanskaya She died in Leningrad . (She, died, Leningrad) ! (olga semenova tyan-shanskaya.Q2392240, place of death.P20, saint petersburg.Q656) -

Parvathy Ratheesh

Parvathy Ratheesh made her film acting debut in 2015 similar to that of her younger brother , Padmaraj Ratheesh who also made his acting debut in the same year through " Fireman " . (Her Film, that of her younger brother, Padmaraj Ratheesh) ! (parvathy ratheesh.Q19895785, sibling.P3373, padmaraj ratheesh.Q19895782) 

Proposed Approach

• MAMA constructs an open knowledge graph with a single forward pass of the language model (without fine-tuning) over the corpus 



We use the term "head" and "tail" to denote head and tail's "entities" or "entity mentions" for simplicity. https://github.com/dair-iitd/OpenIE-standalone There are 2,383 correct oracle facts based on the "manual runs" assessment in TAC KBP. https://github.com/huggingface/neuralcoref https://github.com/huggingface/transformers 6 https://spacy.io/api/sentencizer 7 https://spacy.io/usage/linguistic-features/#noun-chunks



Figure 1: Overview of the proposed approach MAMA. MAMA constructs an open knowledge graph (KG)

LARGE takes approximately 48 hours, and MAMA-GPT-2 XL costs around 96 hours. The resulting candidate facts of Match stage from the 20 servers are then reduced a data server, where a MongoDB database is maintained to store the oracle Wikidata and entity linking results to enable the efficient Map stage. To produce the open KGs, Map stage takes around 18 hours. The setup is similar to TAC KBP. Match stage is done within 48 hours for all the settings. The batch sizes of MAMA-BERT BASE , MAMA-GPT-2, MAMA-GPT-2 MEDIUM , MAMA-GPT-2 LARGE are 64, 32, 16, 8 respectively.



Figure 3: Parameter study with MAMA-BERTBASE on TAC KBP hold-out subset.

Figure 4: Mapped facts: MAMA-BERTLARGE on TAC KBP.

Figure 7: Unmapped facts: MAMA-BERTLARGE on TAC KBP.

in 2006 by defeating longtime incumbent Anwar Chowdhry from Pakistan, who was later barred for alleged financial corruption. (Longtime Incumbent Anwar Chowdhry, from, Pakistan) ! (Anwar Chowdhry, per:countries of residence, Pakistan)

the freestyle lightweight event at the 1920 Summer Olympics . (He, competed in the, The 1920 Summer Olympics) ! (jules deligny.Q16207961, participant of.P1344, 1920 summer olympics.Q8128) -Liz Mcgregor Liz McGregor Liz McGregor is a South African author and a journalist who worked for leading South African newspapers such as the " Sunday Times " and the " Rand Daily Mail " . (Liz Mcgregor, is, A Journalist) ! (liz mcgregor.Q23541069, occupation.the men ' s tournament at the 1952 Summer Olympics . (He, competed in the, The 1952 Summer Olympics) ! (macit gürdal.Q57313835, participant of.P1344, 1952 summer olympics.Q8407) -Markus Pröll At the beginning of his career , Pröll became a target of some ridicule because his name " Pröll " literally means something like " lout " in German . ("Pröll, " in, German) ! (markus pröll.Q704630, country of citizenship.P27, germanymatch against the saw Hadebe score the first senior try of his career in a 27 -10 victory , and the Sharks XV again finished top of the Southern Section log . (His Career, ,, The Sharks Xv) ! (monde hadebe.Q6898677, member of sports team.P54, sharks (currie cup).Q744636) -Neville Southall He moved on to Everton for £150,000 in 1981 and established himself as the club's first-choice goalkeeper by the 1983-84 season.

Figure 17: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Douglas Bader".

Figure 18: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Helen Storrow".

Figure 19: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Jacob van Ruisdael".

Figure 20: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "John Maynard Keynes".

Figure 21: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Liaquat Ali Khan".

Figure 22: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Neville Southall".

Figure 23: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Pauline Baynes".

Figure 24: A snapshot subgraph of the open KG generated by MAMA-BERTLARGE from the Wikipedia page "Thor Heyerdahl'.

Figure 25: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Douglas Bader".

Figure 26: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Helen Storrow".

Figure 27: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Jacob van Ruisdael".

Figure 28: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "John Maynard Keynes".

Figure 29: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Liaquat Ali Khan".

Figure 30: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Neville Southall".

Figure 31: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Pauline Baynes".

Figure 32: A snapshot subgraph of the open KG generated by MAMA-GPT-2XL from the Wikipedia page "Thor Heyerdahl".

Head-tail pair (h, t), sentence s, attention matrix As, action manager O = {START, YIELD, STOP}, beam size k Output: Candidate facts T (h,t)

The objective of Map stage is to generate an open KG. The open KG contains (a) mapped facts in a KG schema (Sec. 2.2.1), e.g., Wikidata schema, if the schema of the candidate facts is within the existing KG schema; and (b) unmapped facts from (a) in an open schema (Sec. 2.2.2).

Partially unmapped facts represent at least one of h, r, and t are mapped to the KG schema. It can be h or t mapped to h k or t k based on the entity linker in Sec. 2.2.1. It can also be r that mapped to r k using the relation mapping in Sec. 2.2.1. This actually results in unmapped facts that are in a mixture of the KG schema and the open schema. As the overall schema of the unmapped facts is not in the KG schema, we use open schema to denote such unmapped facts in the rest of the paper for simplicity. An example is (Dylan, signed, Albert Grossman) in Figure1, where both head and tail are linked to Wikidata schema based on the entity linker in Sec. 2.2.1, but the relation cannot be mapped since there is no relation mapping from "signed" to a KG relation in Wikidata schema.Completely unmapped facts indicate all h, r, and t are not mapped to the KG schema. This means neither the entity linker nor the relation mapping is able to map h, r, and t to h k , r k , t k respectively. The resulting unmapped candidate facts stay in the open schema, e.g., a candidate fact (Jacob, was, A Registered Mennonite) stays the same in the open schema from a sentence "Jacob was a registered Mennonite in Amsterdam.".

Dataset statistics of two knowledge graphs: TAC KBP and Wikidata. TAC KBP refers to TAC KBP Slot Filling 2013 challenge. # of oracle facts for TAC KBP is the number of oracle facts in the 2013 task. # of documents for TAC KBP is the number of the documents in the 2013 task. # of oracle facts for Wikidata is the total number of oracle facts in Wikidata. # of documents for Wikidata is the size of English Wikipedia.

To measure the ability of LMs in generating KGs, we directly measure the quality of resulting open KGs. The open KG contains two types of facts: mapped facts in the fixed KG schema; and unmapped facts in the open schema. We first quantitatively evaluate MAMA by comparing the mapped facts to oracle KGs annotated by humans in Sec. 3.1, then conduct an in-depth analysis of the unmapped facts in Sec. 3.2.

Compare the quality of mapped facts on TAC KBP. #Params of LM refers to the number of parameters of the pre-trained LM.

shows the results on TAC KBP. We use the official scorer of TAC KBP Slot Filling 2013 to evaluate precision, recall, and F1 on TAC KBP 3 .MAMA constructs improved KGs compared to open IE. From the results, we find that all our methods achieve competitive precision, which is greater than 60%, given the unsupervised nature of MAMA. All the proposed methods outperform the two open IE systems. This shows that MAMA is able to produce high-quality knowledge directly from pre-trained LMs by a single forward pass without human supervision. The results show the effectiveness of MAMA in generating candidate facts from Match stage, and producing high-quality KGs through Map stage. We also find that MAMA-GPT-2 XL performs the best. MAMA-GPT-2 XL outperforms the previous state-of-the-art Stanford OpenIE by over 2.6% in F1. This shows the proposed end-to-end MAMA is able to recover the knowledge stored in pre-trained LMs without relying on any extra linguistic features, such as POS tag and dependency parser used in open IE systems. The main reason leading to the moderate results of OpenIE 5.1 is that the system generates objects of the triplets with extraneous words, which hurt the performance in slot filling tasks. Even though the proposed methods all outperform the two open IE systems in the recall, however improving recall is clearly the future direction to further improve the performance of MAMA. We find that the main cause of the moderate recall is the incorrect entities caused by spaCy noun chunk as summarized in Sec. A.2.

Compare the quality of mapped facts on Wikidata. #Params of LM refers to the number of parameters

Douglas Flint will succeed Stephen Green as Group Chairman and Stuart Gulliver will be appointed Group Chief Executive , following Michael Geoghegan ' s decision to retire early next year , " HSBC said in a statement .(Douglas Flint, as, Group Chairman) ! (Douglas Flint, per:title, Chairman) SF13 ENG Mohammed Tantawi , head of Al -Azhar University , told a schoolgirl to remove her niqab when he spotted her during a tour of an Al -Azhar affiliated school , the independent Al -Masry al -Youm newspaper reported this week . (Mohammed Tantawi, head of, Al-Azhar University) ! (Mohammed Sayed Tantawi, per:employee or member of, Al-Azhar University) SF13 ENG He took office in 2006 by defeating longtime incumbent Anwar Chowdhry from Pakistan , who was later barred for alleged financial corruption . (Longtime Incumbent Anwar Chowdhry, from, Pakistan) ! (Anwar Chowdhry, per:origin, Pakistan) SF13 ENG In addition to his wife , Wendy , Dio is survived by son Daniel , grandchildren Julie and Joey , and father Pat . (Dio, is survived by son, Daniel) ! (Ronnie James Dio, per:children, Daniel) SF13 ENG Marshall is charged with grand larceny and fraud and faces up to 25 years in prison if convicted . (Marshall, is charged, Fraud) ! (Anthony Marshall, per:charges, fraud) SF13 ENG Marshall , a Tony Award -winning Broadway producer and former U . S . diplomat , sat stonelike as the jury forewoman read each verdict aloud , the word " guilty " clearly resonating in the otherwise silent courtroom . (Marshall, ,, A Tony Award-Winning Broadway Producer) ! (Anthony Marshall, per:title, producer) American spokesman Adam Gadahn , also known as Azzam the American , called on Muslims in the West on Sunday to carry out more attacks like the deadly shooting at the US base in Fort Hood , Texas . Hakim is the head of Supreme Iraqi Islamic Council ( SIIC ) , the largest Shiite party in Iraq . (Al-Hakim, is, Supreme Iraqi Islamic Council) ! (Abdul Aziz Al-Hakim, per:employee or member of, Supreme Iraqi Islamic Council) SF13 ENG His former Shiite partners have gathered again to form their own group , the Iraqi National Alliance ( INA ) , which includes the influential Supreme Iraqi Islamic Council ( SIIC ) of Ammar al -Hakim , who succeeded his father Abdul Aziz al -Hakim , who died in a hospital in Iran last month after a long battle with cancer . Sully doing this than some stranger , or some hotshot trying to be the next Billy Mays , " said the guy who actually is the next Billy Mays , his son Billy Mays III . (The Next Billy Mays, his son, Billy Mays Iii) ! (Billy Mays, per:children, Billy Mays III) SF13 ENG Fignon continued cycling during and after a stint in the Army , and drew attention in the early 1980s when he managed to keep up with Hinault during a race in which amateurs and professionals rode together . Alan Leshner , chief executive officer of the American Association for the Advancement of Science , noted that Nobels are generally given for work that ' s a decade old or more , and that the U . S . mustn ' t become complacent . SF13 ENG " Americans have a right to know the truth --Islam is a religion of intolerance and violence , " said Richard Thompson , legal director of the Thomas More Law Center in Ann Arbor . " (The Thomas More Law Center, in, Ann Arbor) ! (Thomas More Law Center, org:city of headquarters, Ann Arbor) SF13 ENG New solutions may be enacted for these orphans , though , said Mary Robinson , CEO of the National Council for Adoption . (The National Council, ,, Mary Robinson) ! (National Council for Adoption, org:top members employees, Mary Robinson) SF13 ENG " When you close a country , you end up causing more problems than you prevented , " said Chuck Johnson , CEO of the National Council for Adoption . " (Adoption, Council for, Chuck Johnson) ! (National Council for Adoption, org:top members employees, Chuck Johnson) SF13 ENG " This is definitely a Goldilocks problem , " said Jason Grumet , president of the Bipartisan Policy Center and an energy adviser to the Obama campaign last year . (The Bipartisan Policy Center, ,, Jason Grumet) ! (Bipartisan Policy Center, org:top members employees, Jason Grumet) SF13 ENG " Banks are in strong need for the capital markets ( to raise funds ) , " Li Fuan , a director at the China Banking Regulatory Commission , was quoted as saying at a forum over the weekend . (The China Banking Regulatory Commission, director at, Li Fuan) ! (China Banking Regulatory Commission, org:top members employees, Li Fuan) SF13 ENG RIA Novosti and Interfax cite Anatoly Isaikin , head of Rosoboronexport , as saying Thursday " nothing is blocking the continuation of military -technical cooperation " with Iran . (Rosoboronexport, head of, Anatoly Isaikin) ! (Rosoboronexport, org:top members employees, Anatoly Isaikin) SF13 ENG Rosoboronexport is the only company in Russia that is allowed to export arms , dual -use products and military -related services . (Rosoboronexport, is the only company in, Russia) ! (Rosoboronexport, org:country of headquarters, Russia) SF13 ENG With his wife , Cornelie , Middelhoff invested money in 2000 and 2001 with Esch in funds that were formed to buy five properties from KarstadtQuelle , as Arcandor was then known , and leased back to the department store chain before Middelhoff joined the company , according to Middelhoff ' s spokesman . -order subsidiary Quelle is in worse shape however , Klaus -Hubert Goerg told a press conference in the western city of Essen , where Arcandor is based .

sign of just how disenchanted some Afghans have become with their government, Bashardost, a doctoral scholar who lived in France for two decades, is widely believed to be at least fourth in popularity among 42 candidates in the August elections. American spokesman Adam Gadahn, also known as Azzam the American, called on Muslims in the West on Sunday to carry out more attacks like the deadly shooting at the US base in Fort Hood, Texas.(Adam Gadahn, , also known, Azzam) ! (Adam Gadahn, per:alternate names, Azzam) SF13 ENG Gadahn, also known as Azzam the American, was born in 1978. (Gadahn, also known as, Azzam) ! (Adam Gadahn, per:alternate names, Azzam) SF13 ENG Gadahn grew up in California and converted to Islam before he moved to Pakistan in 1998 and attended an al-Qaida training camp six years later, according to media reports. (Gadahn, in, California) ! (Adam Gadahn, per:statesorprovinces of residence, California) Hakim is the head of Supreme Iraqi Islamic Council (SIIC), the largest Shiite party in Iraq. (Al-Hakim, of, Supreme Iraqi Islamic Council) ! (Abdul Aziz Al-Hakim, per:employee or member of, Supreme Iraqi Islamic Council) SF13 ENG His former Shiite partners have gathered again to form their own group, the Iraqi National Alliance (INA), which includes the influential Supreme Iraqi Islamic Council (SIIC) of Ammar al-Hakim, who succeeded his father Abdul Aziz al-Hakim, who died in a hospital in Iran last month after a long battle with cancer. Non-state actors, for example, a small group of pirates off the coast of Somalia, Al Qaida and Taliban who operate across borders and have more and more sophisticated means of violence, are becoming bigger and bigger challenges to the international system," said Bates Gill, director of the Stockholm International Peace Research Institute. the Swiss Bankers Association, Patrick Odier, told weekly NZZ am Sonntag that Italy and France have shown interest in deals like ones Switzerland signed this week with Germany and Britain. (The Swiss Bankers Association, ,, Patrick Odier) ! (Swiss Bankers Association, org:top members employees, Patrick Odier) SF13 ENG The majority of voters in Switzerland, which manages more than 25 percent of the world's foreign-held private wealth, support banking secrecy, according to a survey published last month by the Swiss Bankers Association in Basel. (The Swiss Bankers Association, in, Basel) ! (Swiss Bankers Association, org:city of headquarters, Basel) SF13 ENG "Americans have a right to know the truth -Islam is a religion of intolerance and violence," said Richard Thompson, legal director of the Thomas More Law Center in Ann Arbor. " (The Thomas More Law Center, in, Ann Arbor) ! (Thomas More Law Center, org:city of headquarters, Ann Arbor) said he had known Gwathmey for 50 years , has particularly fond memories from the time when Gwathmey was first courting his second wife , Bette -Ann Damson , and they all picked corn for dinner in a field adjacent to a barn Meier was renting on the East End of Long Island . There hasn't been a concerted push to open doors for Muslim orphans because the expectation would be that those efforts would fall flat," said Chuck Johnson, chief executive of the National Council for Adoption, a policy group in Alexandria, Va. American poet whose work trained lenses wide and narrow on the experience of being black and female in the 20th century , exploring vast subjects like the indignities of history and intimate ones like the indignities of the body , died on Saturday in Baltimore . PIONEER FRENCH FILMMAKER , DIES AT 80 Joseph Berger contributed reporting from New York , and Maia de la Baume contributed from Paris . (Claude Chabrol, ,, Pioneer French Filmmaker) ! (Claude Chabrol, per:title, Pioneer French Filmmaker) SF13 ENG Chabrol ' s survivors also include his third wife , Aurore Pajot , who acted as his script supervisor on nearly all of his movies from 1968 on and whom he married in 1981 ; and Pajot ' s daughter , Cecile Maistre , who was an assistant director on his films and wrote the script with him for " The Girl Cut in Two " ( 2007 ) . (His Third Wife, ,, Aurore Pajot) ! (Claude Chabrol, per:spouse, Aurore Pajot) SF13 ENG Dominick Dunne , a novelist and journalist who chronicled true -crime tales of the rich and infamous , including O . J . Simpson and Claus von Bulow , and in turn became a celebrity in his own right , died of bladder cancer Aug . 26 at his home in New York City . (Dominick Dunne, ,, A Novelist) ! (Dominick Dunne, per:title, A Novelist) SF13 ENG Dunne and his wife , Ellen Griffin Dunne , known as Lenny , were married in 1954 . (His Wife, ,, Ellen Griffin Dunne) ! (Dominick Dunne, per:spouse, Ellen Griffin Dunne) SF13 ENG Dunne was born in 1925 in Hartford , Connecticut , to a wealthy Roman Catholic family and grew up in some of the same social circles as the Kennedys . (Dunne, ,, Connecticut) ! (Dominick Dunne, per:stateorprovince of birth, Connecticut) SF13 ENG A professor emeritus at Yale University , Mandelbrot was born in Poland but as a child moved with his family to France where he was educated . (Mandelbrot, ,, Yale University) ! (Benoit Mandelbrot, per:employee or member of, Yale University) SF13 ENG Access Industries , a privately held company founded in 1986 by Len Blavatnik , has a diverse portfolio of investments in industry , real estate , media and telecommunications . (Access Industries, ,, A Privately Held Company) ! (Access Industries, properties, A Privately Held Company) SF13 ENG Water and its links to development , peace and conflict were key words in the annual sessions , Anders Berntell , executive director of Stockholm International Water Institute ( SIWI ) , said in his opening address .

× 100 metres relay at the 1924 Summer Olympics . (He, competed in the, The 1924 Summer Olympics) ! (ioannis talianos.Q35692847, participant of.P1344, 1924 summer olympics.Q8132) -Joan Of France, Duchess Of Berry The nuns of the Order of the Annunciation of the Blessed Virgin Mary still maintain their way of life in monasteries in France, Belgium, Costa Rica and Poland. -(Their Way, in, France) ! (joan of france, duchess of berry.Q236220, country of citizenship.P27, france.Q142) John Maynard Keynes John Maynard Keynes John Maynard Keynes , 1st Baron Keynes , was a British economist , whose ideas fundamentally changed the theory and practice of macroeconomics and the economic policies of governments . is Class of 1916 Professor of English and Comparative Literature at Cornell University .

Peter Dalla RivaIn 1993, he was inducted into the Canadian Football Hall of Fame. -(He, was inducted into the, Fame) ! (peter dalla riva.Q7173528, award received.P166, canadian football hall of fame.Q3517653) Pierre Duhem Unlike many former historians , who denigrated the Middle Ages, he endeavored to show that the Roman Catholic Church had helped foster Western science in one of its most fruitful periods. -(The Roman Catholic Church, had, Foster Western Science) ! (pierre duhem.Q314172, field of work.P101, philosophy of science.Q59115) Rafael Arcadio Bernal Supelano He then served as bishop of the Roman Catholic Diocese of Arauca, Colombia, from 1990 to 2003 and as bishop of the Roman Catholic Diocese of Libano-Honda, Colombia, from 2003 to 2004. The Padma Vibhushan) ! (rajeshwar dayal.Q15448539, award received.P166, padma vibhushan.Q672392) Ray Johnson (American Football) Ray Johnson Raymond Robert Johnson was an American football defensive back who played three seasons in the National Football League with the Cleveland Rams and Chicago Cardinals . (The National Football League, back who played, Raymond Robert Johnson) ! (ray johnson (american football).Q21077889, sport.P641, american football.Q41323) -Sean Hanish Sean Hanish Sean Hanish is an American film writer , producer and director best known for " Saint Judy " , " Return to Zero " and " Sister Cities " . (Sean Hanish, is, An American Film Writer) ! (sean hanish.Q19867622, country of citizenship.P27, united states.Q30); (Sean Hanish, is, An American Film Writer) ! (sean hanish.Q19867622, occupation.P106, screenwriter.Q28389) -Seth Plum Born in Edmonton, Plum played professionally for Charlton Athletic, Chelsea and Southend United. -(Plum, played professionally for, Charlton Athletic) ! (seth plum.Q7456581, member of sports team.P54, charlton athletic f.c..Q19462) Seth Plum Plum received his only cap for England at age 23 while playing for Charlton Athletic , starting and playing the full 90 minutes in a 4 -1 win over France on 10 May 1923 . (His Only Cap, playing for, Charlton Athletic) ! (seth plum.Q7456581, member of sports team.P54, charlton athletic f.c..Q19462) -Stanislav Nedkov A two -time national champion in freestyle wrestling , he has also had first -place finishes in international tournaments in Russia , Moldova , Turkey , and Bulgaria . Bulgaria'S All-Time Most-Capped Player) ! (stiliyan petrov.Q166263, country of citizenship.P27, bulgaria.Q219) Stiliyan Petrov On 14 January 2010 , it was announced that Petrov had come second in Bulgaria ' s Player of the Year . (Petrov, in, Bulgaria'S Player) ! (stiliyan petrov.Q166263, country of citizenship.P27, bulgaria.Q219) -Thomas A. Dunn He graduated from DePaul University College of Law in 1971. -(He, graduated from, Depaul University College) ! (thomas a. dunn.Q33082990, educated at.P69, depaul university college of law.Q5244034) Thomas Scott (Commentator) In 1803 , Scott left the Lock Hospital to become Rector of Aston Sandford in Buckinghamshire where he remained until his death in 1821 . Naked in Deccan " , won the 1984 American Book Award of the Before Columbus Foundation and was listed among the top ten novels of the decade by the " Chicago Tribune " . Mapped facts: MAMA-BERTLARGE vs. MAMA-GPT-2XL on Wikidata.

Match: generates a set of candidate facts from a textual corpus • Map: produces an open knowledge graph from the matched candidates

annex

Under review as a conference paper at ICLR 2021 He played for Hibs in their most successful era renowned for their front five known as The Famous Five .(He, played for, Hibs) ! (john ogilvie (footballer).Q6251147, member of sports team.P54, hibernian f.c. Ernst Haeckel Ernst Heinrich Philipp August Haeckel was a German zoologist , naturalist , philosopher , physician , professor , marine biologist , and artist who discovered , described and named thousands of new species , mapped a genealogical tree relating all life forms , and coined many terms in biology , including " ecology " , " phylum " , " phylogeny " , and " Protista . An Open Question• Assumption: Language models and knowledge graphs just encode the same world knowledge in two different formats, thus the two formats should be equal. But,

Is the assumption TRUE?

Language Models Knowledge Graphs

Contributions

• Problem: How to construct knowledge graphs from pre-trained language models. • Approach: An unsupervised two-stage approach that constructs knowledge graphs with a single forward pass of the pre-trained language models without fine-tuning over the textual corpora (outperforming compared methods by over 5.6% in F1 on Wikidata).• Result: Open knowledge graphs not only cover the knowledge already in existing knowledge graphs (e.g., Wikidata), but also features open factual knowledge that is new.

