PROTOTYPICAL REPRESENTATION LEARNING FOR RE-LATION EXTRACTION

Abstract

Recognizing relations between entities is a pivotal task of relational learning. Learning relation representations from distantly-labeled datasets is difficult because of the abundant label noise and complicated expressions in human language. This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data that are effective in different settings, including supervised, distantly supervised, and few-shot learning. Instead of solely relying on the supervision from noisy labels, we propose to learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Prototypes are representations in the feature space abstracting the essential semantics of relations between entities in sentences. We learn prototypes based on objectives with clear geometric interpretation, where the prototypes are unit vectors uniformly dispersed in a unit ball, and statement embeddings are centered at the end of their corresponding prototype vectors on the surface of the ball. This approach allows us to learn meaningful, interpretable prototypes for the final classification. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art models. We further demonstrate the robustness of the encoder and the interpretability of prototypes with extensive experiments.

1. INTRODUCTION

Relation extraction aims to predict relations between entities in sentences, which is crucial for understanding the structure of human knowledge and automatically extending knowledge bases (Cohen & Hirsh, 1994; Bordes et al., 2013; Zeng et al., 2015; Schlichtkrull et al., 2018; Shen et al., 2020) . Learning representations for relation extraction is challenging due to the rich forms of expressions in human language, which usually contains fine-grained, complicated correlations between marked entities. Although many works are proposed to learn representations for relations from well-structured knowledge (Bordes et al., 2013; Lin et al., 2015; Ji et al., 2015) , when we extend the learning source to be unstructured distantly-labeled text (Mintz et al., 2009) , this task becomes particularly challenging due to spurious correlations and label noise (Riedel et al., 2010) . This paper aims to learn predictive, interpretable, and robust relation representations from large-scale distantly labeled data. We propose a prototype learning approach, where we impose a prototype for each relation and learn the representations from the semantics of each statement, rather than solely from the noisy distant labels. Statements are defined as sentences expressing relations between two marked entities. As shown in Figure 1 , a prototype is an embedding in the representation space capturing the most essential semantics of different statements for a given relation. These prototypes essentially serve as the center of data representation clusters for different relations and are surrounded by statements expressing the same relation. We learn the relation and prototype representations based on objective functions with clear geometric interpretations. Specifically, our approach assumes prototypes are unit vectors uniformly dispersed in a unit ball, and statement embeddings are centered : The subject of imply" is the source of an implication while the subject of "infer" is the recipient… (Cause-Effect) s 1 : Lopez and Espinoza took plea agreements that resulted in diversion --the completion of requirements… (Cause-Effect) s 2 : That is one of the many reasons that a CEO's team includes an experienced general counsel. (Collection-Member) s 5 : The cells inside a nested table are isolated from changes made to the outer table. (Component-Whole) s 4 : As well as the wildlife, India has a vast treasure trove of palaces, forts, temples, museums and… (Collection-Member) s 6 : The small photoreceptors of the retina (the inner surface at the back of the eye) sense… (Component-Whole) s 3 at the end of their corresponding prototype vectors on the surface of the ball. We propose statementstatement and prototype-statement objectives to ensure the intra-class compactness and inter-class separability. Unlike conventional cross-entropy loss that only uses instance-level supervision (which could be noisy), our objectives exploit the interactions among all statements, leading to a predictively powerful encoder with more interpretable and robust representations. We further explore these properties of learned representations with extensive experiments. We apply our approach using a pretraining fine-tuning paradigm. We first pretrain a relation encoder with prototypes from a large-scale distantly labeled dataset, then fine-tune it on the target dataset with different relational learning settings. We further propose a probing dataset, FuzzyRED dataset (Section 4.3), to verify if our method can capture the underlying semantics of statements. Experiments demonstrate the predictive performance, robustness, and interpretability of our method. For predictive performance, we show that our model outperforms existing state-of-the-art methods on supervised, few-shot, and zero-shot settings (Section 4.4 and 4.2). For robustness, we show how the model generalize to zero-shot setting (Section 4.2) and how the prototypes regularize the decision boundary (Section 4.5). For interpretability, we visualize the learned embeddings and their corresponding prototypes (Section 4.4) and show the cluster clearly follow the geometric structure that the objective functions impose. The source code of the paper will be released at https://github.com/ Alibaba-NLP/ProtoRE.

2. RELATED WORK

Relation learning could be mainly divided into three categories. The logic-based methods reason relations via symbolic logic rules, with adopting probabilistic graphical models or inductive logic systems to learn and infer logic rules orienting to relations (Cohen & Hirsh, 1994; Wang et al., 2015; Yang et al., 2017; Chen et al., 2018; Kazemi & Poole, 2018; Qu & Tang, 2019) . The graph-based methods encode entities and relations into low-dimensional continues spaces to capture structure features of KBs. (Nickel et al., 2011; Bordes et al., 2013; Yang et al., 2015; Wang et al., 2014; Nickel et al., 2016; Ji et al., 2015; Lin et al., 2015; Trouillon et al., 2016; Sun et al., 2019; Balažević et al., 2019) . The text-based methods have been widely explored recently, focusing on extracting semantic features from text to learn relations. The conventional methods to learn relations from the text are mainly supervised models, like statistical supervised models (Zelenko et al., 2003; GuoDong et al., 2005; Mooney & Bunescu, 2006) . As deep neural networks have gained much attention then, a series of neural supervised models have been proposed (Liu et al., 2013; Zeng et al., 2014; Xu et al., 2015; Santos et al., 2015; Zhang & Wang, 2015; Verga et al., 2016; Verga & McCallum, 2016; Li et al., 2019; Distiawan et al., 2019; Ding et al., 2019) . To address the issue of the insufficiency of annotated data, distant supervision has been applied to automatically generate a dataset with heuristic rules (Mintz et al., 2009) . Accompanying with auto-labeled data, massive noise will be introduced into models. Accordingly, various denoising methods have been explored to reduce noise effects for distant supervision (Zeng et al., 2015; Lin et al., 2016; Jiang et al., 2016; Han et al., 2018a; Wu et al., 2017; Qin et al., 2018a; b; Feng et al., 



Figure 1: The t-SNE visualization of relation representations and corresponding prototypes learned by our model. In the right part, s 1:6 are examples of input statements, where red and blue represent the head and tail entities, and the italics in parenthesis represents the relation between them.

