STRUCTURED PREDICTION AS TRANSLATION BETWEEN AUGMENTED NATURAL LANGUAGES

Abstract

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.

1. INTRODUCTION

Structured prediction refers to inference tasks where the output space consists of structured objects, for instance graphs representing entities and relations between them. In the context of natural language processing (NLP), structured prediction covers a wide range of problems such as entity and relation extraction, semantic role labeling, and coreference resolution. For example, given the input sentence "Tolkien's epic novel The Lord of the Rings was published in 1954-1955, years after the book was completed" we might seek to extract the following graphs (respectively in a joint entity and relation extraction, and a coreference resolution task): structured prediction tasks with the same architecture, while leveraging any latent knowledge that the pre-trained model may have about the label semantics? In this paper, we propose to solve this problem with a text-to-text model, by framing it as a task of Translation between Augmented Natural Languages (TANL). Figure 1 shows how the previous example is handled within our framework, in the case of three different structured prediction tasks. The augmented languages are designed in a way that makes it easy to encode structured information (such as relevant entities) in the input, and to decode the output text into structured information. We show that out-of-the-box transformer models can easily learn this augmented language translation task. In fact, we successfully apply our framework to a wide range of structured prediction problems, obtaining new state-of-the-art results on many datasets, and highly competitive results on all other datasets. We achieve this by using the same architecture and hyperparameters on all tasks, the only difference among tasks being the augmented natural language formats. This is in contrast with previous approaches that use task-specific discriminative models. The choice of the input and output format is crucial: by using annotations in a format that is as close as possible to natural language, we allow transfer of latent knowledge that the pre-trained model has about the task, improving performance especially in a low-data regime. Nested entities and an arbitrary number of relations are neatly handled by our models, while being typical sources of complications for previous approaches. We implement an alignment algorithm to robustly match the structural information extracted from the output sentence with the corresponding tokens in the input sentence. We also leverage our framework to train a single model to solve all tasks at the same time, and show that it achieves comparable or better results with respect to training separately on each task. To the best of our knowledge, this is the first model to handle such a variety of structured prediction tasks without any additional task-specific modules. To summarize, our key contributions are the following. 1. We introduce TANL, a framework to solve several structure prediction tasks in a unified way, with a common architecture and without the need for task-specific modules. We cast structured prediction tasks as translation tasks, by designing augmented natural languages that allow us to encode structured information as part of the input or output. Robust alignment ensures that extracted structure is matched with the correct parts of the original sentence (Section 3). 2. We apply our framework to (1) joint entity and relation extraction; (2) named entity recognition; (3) relation classification; (4) semantic role labeling; (5) coreference resolution; (6) event extraction; (7) dialogue state tracking (Sections 4 and 5). In all cases we achieve at least comparable results to the current state-of-the-art, and we achieve new state-of-the-art performance on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012) . 3. We also train a single model simultaneously on all tasks (multi-task learning), obtaining comparable or better results as compared with single-task models (Section 5.1). 4. We show that, thanks to the improved transfer of knowledge about label semantics, we can significantly improve the performance in the few-shot regime over previous approaches (Section 5.2).



Figure1: Our TANL model translates between input and output text in augmented natural language, and the output is then decoded into structured objects.

