STRUCTURED PREDICTION AS TRANSLATION BETWEEN AUGMENTED NATURAL LANGUAGES

Abstract

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.

1. INTRODUCTION

Structured prediction refers to inference tasks where the output space consists of structured objects, for instance graphs representing entities and relations between them. In the context of natural language processing (NLP), structured prediction covers a wide range of problems such as entity and relation extraction, semantic role labeling, and coreference resolution. For example, given the input sentence "Tolkien's epic novel The Lord of the Rings was published in 1954-1955, years after the book was completed" we might seek to extract the following graphs (respectively in a joint entity and relation extraction, and a coreference resolution task): et al., 2019 ). Yet, this presents two limitations. First, a discriminative classifier cannot easily leverage latent knowledge that the pretrained model may already have about the meaning (semantics) of task labels such as person and author. For instance, knowing that a person can write a book would greatly simplify learning the author relation in the example above. However, discriminative models are usually trained without knowledge of the label semantics (their targets are class numbers), thus preventing such positive transfer. Second, since the architecture of a discriminative model is adapted to the specific task, it is difficult to train a single model to solve many tasks, or to fine-tune a model from a task to another (transfer learning) without changing the task-specific components of the discriminator. Hence, our main question is: can we design a framework to solve different



c novel [ The Lord of the Rings ] was published in 1954-1955, book was completed. e Lord of the Rings | subject] e ] [ in 1954-1955 | temporal], book was completed. novel [ The Lord of the Rings | 54-1955, years after the [ book Rings ] was completed. person ]'s epic novel [ The Lord of the Rings | book | author = Tolkien ] was published in 1954-1955, years after the book was completed. Joint entity and relation extraction Tolkien's epic novel [ The Lord of the Rings | subject] [ was published | predicate ] [ in 1954-1955 | temporal], years after the book was completed. Semantic Role Labeling [ Tolkien | head ]'s epic novel [ The Lord of the Rings | head ] was published in 1954-1955, years after the [ book | The Lord of the Rings ] was completed. structured prediction by employing task-specific discriminators for the various types of relations or attributes, on top of pretrained transformer such as BERT (Devlin

