VER : LEARNING NATURAL LANGUAGE REPRE-SENTATIONS FOR VERBALIZING ENTITIES AND RELA-TIONS

Abstract

Entities and relationships between entities are vital in the real world. Essentially, we understand the world by understanding entities and relations. For instance, to understand a field, e.g., computer science, we need to understand the relevant concepts, e.g., machine learning, and the relationships between concepts, e.g., machine learning and artificial intelligence. To understand a person, we should first know who he/she is and how he/she is related to others. To understand entities and relations, humans may refer to natural language descriptions. For instance, when learning a new scientific term, people usually start by reading its definition in dictionaries or encyclopedias. To know the relationship between two entities, humans tend to create a sentence to connect them. In this paper, we propose VER : A Unified Model for Verbalizing Entities and Relations. Specifically, we attempt to build a system that takes any entity or entity set as input and generates a sentence to represent entities and relations, named "natural language representation". Extensive experiments demonstrate that our model can generate high-quality sentences describing entities and entity relationships and facilitate various tasks on entities and relations, including definition modeling, relation modeling, and generative commonsense reasoning. 1 

1. INTRODUCTION

What is X? What is the relationship between X and Y? We come up with these questions almost every day. When we come across a new term, e.g., twin prime, we usually refer to its definition to understand it, i.e., "A twin prime is a prime number that is either 2 less or 2 more than another prime number". To express the understanding about relationship between entities (e.g., carbon dioxide and water), we create a sentence to represent their relationship: "Carbon dioxide is soluble in water". Basically, we understand entities and relations by "verbalizing" them. Verbalizing entities and relations also tests our knowledge about entities and relations. Literally, by verbalizing entities and relations, we understand the world. Similarly, do machines have the ability to verbalize entities and relations? Can machines learn about entities and relations from verbalizing them? The answer is "Yes". Recent studies show that by giving the surface name of an entity (and its context), models (after training) can generate coherent sentences to represent it, i.e., definition modeling (Noraset et al., 2017; Gadetsky et al., 2018; Bevilacqua et al., 2020; August et al., 2022; Huang et al., 2021b; Gardner et al., 2022) , and by giving the surface names of a pair of entities, machines can generate coherent sentences describing their relationships, i.e., (open) relation modeling (Huang et al., 2022a; b) . However, verbalizing entities requires understanding relationships between entities, and verbalizing entity relationships requires understanding entities themselves, while existing works deal with entity and relation verbalization separately, ignoring the connections between them. Besides, recent works (Devlin et al., 2019; Lewis et al., 2020; Radford et al., 2019; Brown et al., 2020) have shown that large language models pre-trained with self-supervised objectives can equip



We release the VER-base model on this anonymous repo https://osf.io/7csnf/?view_only= 91ec67e05bd44f998d71e63d9cdd25a4. VER-large and the pre-training data will be released as opensource after the review process (since the anonymous repo has a space limitation).

