VER : LEARNING NATURAL LANGUAGE REPRE-SENTATIONS FOR VERBALIZING ENTITIES AND RELA-TIONS

Abstract

Entities and relationships between entities are vital in the real world. Essentially, we understand the world by understanding entities and relations. For instance, to understand a field, e.g., computer science, we need to understand the relevant concepts, e.g., machine learning, and the relationships between concepts, e.g., machine learning and artificial intelligence. To understand a person, we should first know who he/she is and how he/she is related to others. To understand entities and relations, humans may refer to natural language descriptions. For instance, when learning a new scientific term, people usually start by reading its definition in dictionaries or encyclopedias. To know the relationship between two entities, humans tend to create a sentence to connect them. In this paper, we propose VER : A Unified Model for Verbalizing Entities and Relations. Specifically, we attempt to build a system that takes any entity or entity set as input and generates a sentence to represent entities and relations, named "natural language representation". Extensive experiments demonstrate that our model can generate high-quality sentences describing entities and entity relationships and facilitate various tasks on entities and relations, including definition modeling, relation modeling, and generative commonsense reasoning. 1 

1. INTRODUCTION

What is X? What is the relationship between X and Y? We come up with these questions almost every day. When we come across a new term, e.g., twin prime, we usually refer to its definition to understand it, i.e., "A twin prime is a prime number that is either 2 less or 2 more than another prime number". To express the understanding about relationship between entities (e.g., carbon dioxide and water), we create a sentence to represent their relationship: "Carbon dioxide is soluble in water". Basically, we understand entities and relations by "verbalizing" them. Verbalizing entities and relations also tests our knowledge about entities and relations. Literally, by verbalizing entities and relations, we understand the world. Similarly, do machines have the ability to verbalize entities and relations? Can machines learn about entities and relations from verbalizing them? The answer is "Yes". Recent studies show that by giving the surface name of an entity (and its context), models (after training) can generate coherent sentences to represent it, i.e., definition modeling (Noraset et al., 2017; Gadetsky et al., 2018; Bevilacqua et al., 2020; August et al., 2022; Huang et al., 2021b; Gardner et al., 2022) , and by giving the surface names of a pair of entities, machines can generate coherent sentences describing their relationships, i.e., (open) relation modeling (Huang et al., 2022a; b) . However, verbalizing entities requires understanding relationships between entities, and verbalizing entity relationships requires understanding entities themselves, while existing works deal with entity and relation verbalization separately, ignoring the connections between them. Besides, recent works (Devlin et al., 2019; Lewis et al., 2020; Radford et al., 2019; Brown et al., 2020) have shown that large language models pre-trained with self-supervised objectives can equip the model with a significant amount of knowledge (Petroni et al., 2019; Roberts et al., 2020) and achieve substantial gains after fine-tuning on a specific task. Can we continually pre-train the models with pre-training objectives on entities and relations to enhance their ability on verbalizing entities and relations? In this way, the model can be easier and better adapted to specific tasks on entities and relations and even be used without additional training. Therefore, we aim to solve entity and relation verbalization in a unified form and pre-train a model for entity and relation understanding. Essentially, definition modeling and relation modeling can be unified as an "entity(s) → sentence" task, i.e., given a set of entities, generating a sentence describing the entities and their relationships. When the size of the set is 1, it is equivalent to definition modeling, and when the size of the set is 2, it is equivalent to relation modeling. By defining the task in this form, we can even model more complex relationships among entities since entity relationships can go beyond pairwise (Bretto, 2013), named hyper-relation modeling, e.g., {carbon dioxide, water, arbonic acid} → "Carbon dioxide reacts with water to produce arbonic acid". Based on this, we propose VER : A Unified Model for Verbalizing Entities and Relations (Figure 1 ). Specifically, we pre-train models by forming a self-supervised text reconstruction task: given an entity or a set of entities, reconstruct the original sentences (e.g., a definition or a relation description) containing them in the training corpus. In this way, the models acquire knowledge about entities and relations and learn to connect entities to a meaningful coherent sentence. From the perspective of representation learning, we propose to learn the "natural language representations" of entities and relations. Compared to latent vector representations, natural language representations are more interpretable since humans can understand the representations by reading the texts, while hidden representations are difficult to interpret. Compared to structural representations with pre-specified rules, e.g., a sub-knowledge graph, natural language representations are more open owing to the flexibility of free texts. Experiments on six datasets demonstrate the superiority of our model in verbalizing entities and relations. Especially in low-resource settings, our model can achieve significantly better results than BART (Lewis et al., 2020) on definition modeling, relation modeling, and hyper-relation modeling (generative commonsense reasoning (Lin et al., 2020) ). In addition, the performance of VER without additional training is impressive, making itself a potential knowledge source of entities and relations, which may benefit tasks on entities and relations such as entity typing Ren et al. (2016) , relation extraction (Bach & Badaskar, 2007) , and knowledge graph completion (Lin et al., 2015) . The main contributions of our work are summarized as follows: • We connect definition modeling, relation modeling, and hyper-relation modeling in a unified form; • We pre-train VER on a large training data by forming the "entity(s) → sentence" reconstruction task, which makes VER a useful tool for learning natural language representations of entities and relations; • Extensive experiments demonstrate our model can achieve better results in verbalizing entities and relations, especially in low-resource settings.



We release the VER-base model on this anonymous repo https://osf.io/7csnf/?view_only= 91ec67e05bd44f998d71e63d9cdd25a4. VER-large and the pre-training data will be released as opensource after the review process (since the anonymous repo has a space limitation).



Figure1: A diagram of VER. We feed the model with entity(s) and train it to reconstruct sentences containing all the entities. This allows us to use a single model to better "verbalize" entities and complex entity relationships.

