UTC-IE: A UNIFIED TOKEN-PAIR CLASSIFICATION ARCHITECTURE FOR INFORMATION EXTRACTION

Abstract

Information Extraction (IE) spans several tasks with different output structures, such as named entity recognition, relation extraction and event extraction. Previously, those tasks were solved with different models because of diverse task output structures. Through re-examining IE tasks, we find that all of them can be interpreted as extracting spans and span relations. We propose using the start and end token of a span to pinpoint the span in texts, and using the start-to-start and end-to-end token pairs of two spans to determine the relation. Hence, we can unify all IE tasks under the same token-pair classification formulation. Based on the reformulation, we propose a Unified Token-pair Classification architecture for Information Extraction (UTC-IE), where we introduce Plusformer on top of the token-pair feature matrix. Specifically, it models axis-aware interaction with plusshaped self-attention and local interaction with Convolutional Neural Network over token pairs. Experiments show that our approach outperforms task-specific and unified models on all tasks in 10 datasets, and achieves better or comparable results on 2 joint IE datasets. Moreover, UTC-IE speeds up over state-of-the-art models on IE tasks significantly in most datasets, which verifies the effectiveness of our architecture.

1. INTRODUCTION

Information Extraction (IE) aims to identify and classify structured information from unstructured texts (Andersen et al., 1992; Grishman, 2019) . IE consists of a wide range of tasks, such as named entity recognition (NER), joint entity relation extraction (RE) 1 and event extraction (EE)foot_1 . In the last decade, many paradigms have been proposed to solve IE tasks, such as sequence labeling (McCallum & Li, 2003; Huang et al., 2015; Zheng et al., 2017; Yu et al., 2020a) , span-based classification (Jiang et al., 2020; Yu et al., 2020b; Wang et al., 2021; Ye et al., 2022) , MRC-based methods (Levy et al., 2017; Li et al., 2020; Liu et al., 2020) and generation-based methods (Zeng et al., 2018; Yan et al., 2021a; Hsu et al., 2022) . The above work mainly concentrates on solving individual tasks, but it is desired to have a unified model to solve all IE tasks without designing dedicated modules. Besides, tackling all IE tasks with one model can facilitate knowledge sharing between different tasks. Therefore, various attempts have been made to unify all IE tasks with one model structure. Therefore, all IE tasks can be decomposed into token pair classifications. After the reformulation, the local dependency and interaction from the plus-shaped orientation (as the orange and blue dotted lines depict) can provide vital information to classify the central token pair. or relational extractionfoot_2 (relation extraction in RE and argument role classification in EE). Based on this perspective, we further simplify and unify all IE tasks into token-pair classification tasks. Figure 1 shows how each task can be converted. Specifically, a span is decomposed into start-to-end and end-to-start token pairs. As depicted, the entity "School of Computer Science" in Figure 1 (a) is decomposed into indices of (School, Science) and (Science, School). As for detecting the relation between two spans, we convert it into start-to-start and end-to-end token pairs from head mention to tail mention. For example, in Figure 1 (b), the relation "Author" between "J.K. Rowling" and "Harry Potter novels" is decomposed into indices of (J.K., Harry) and (Rowling, novels). Based on the above decomposition, we propose a Unified Token-pair Classification architecture for Information Extraction (UTC-IE). Specifically, we first apply Biaffine model on top of the pretrained language model to get representations of token pairs. Then we design a novel Transformer to obtain interactions between them. As the plus-shaped dotted lines depicted in Figure 1 , token pairs in horizontal and vertical directions cover vital information for the classification on the central token pair. For span extraction, token pairs in the plus-shaped orientation are either clashing or nested with the central token pair, for example, e 2 is contained by e 1 in Figure 1(a) ; for relational extraction, the central token pair's two constituent spans locate in the plus-shaped orientation, such as in Figure 1 (b), r is determined by e 1 and e 2 . Therefore, we make one token pair only attend horizontally and vertically in the token pair feature matrix. In addition, position embeddings are incorporated to keep the token pairs position-aware. Moreover, neighboring token pairs are highly likely to be informative to determine the types of the central token pair, so we apply Convolutional Neural Network (CNN) to model the local interaction after the plus-shaped attention. Since the attention map for one token pair is intuitively similar to the plus operator, we name the novel module as Plusformer. We conduct numerous experiments in two settings. When training separately on each task, our model outperforms previous task-specific and unified models on 10 datasets of all IE tasks. When training a single model simultaneously on all IE tasks in one dataset (named as joint IE task), UTC-IE achieves better or comparable results than 2 joint IE baselines. To thoroughly analyze why our UTC-IE architecture is useful in IE tasks under the token-pair paradigm, we execute several ablation studies. We observe that CNN module in Plusformer plays a significant role in IE tasks because of the abundant local dependency between token pairs after the reformulation. Furthermore, owing to



Joint entity relation extraction aims to extract both entities and relations. In our paper, we call it relation extraction (RE) for simplicity. Event extraction covers trigger extraction and argument extraction, where we first conduct argument span detection and then conduct argument role classification in our architecture. In this paper, we use relational extraction to represent extracting relations between spans, which has broader meanings than relation extraction.



Wadden et al. (2019); Lin et al. (2020); Nguyen et al. (2021) encode all IE tasks' target structure as graphs and design graph-based methods to predict them; Paolini et al. (2021); Lu et al. (2022) solve general IE tasks in a generative way with a text-to-text or text-to-structure framework. However, graph-based models tend to be complex to design, and generative models are time-consuming to decode. In our work, we creatively propose a simple yet effective paradigm for unified IE. Inspired by Jiang et al. (2020), we re-examine IE tasks and consider that all of them are fundamentally span extraction (entity extraction in NER and RE, trigger classification and argument span detection in EE)

An illustration of the token-pair decomposition for IE tasks. Each cell represents one token pair, and it can be classified into pre-defined types. e, r, t, a and rol in figures mean entity, relation, event trigger, event argument and event role. For the span extraction, we use the start-to-end and end-to-start token pairs to pinpoint the span, such as entity spans e 1 , e 2 , argument spans a 1 , a 2 and trigger span t (cells with pure color). For the relational extraction, we use the start-to-start and end-to-end token pairs to represent the relation, such as r and rol 1 , rol 2 (cells with gradient color).

