GROOT: CORRECTIVE REWARD OPTIMIZATION FOR GENERATIVE SEQUENTIAL LABELING

Abstract

Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models has shown great success on these problems. However, the training objectives are still significantly disconnected with the metrics and desiderata we care about in practice. For example, a practical sequence tagging application may want to optimize for a certain precision-recall trade-off (of the top-k predictions) which is quite different from the standard objective of maximizing the likelihood of the gold labeled sequence. Thus to bridge this gap, we propose GROOT -a simple yet effective framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. Using an iterative training regime, we first generate prediction candidates, then correct errors in them, and finally contrast those candidates (based on their reward values). As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics. Furthermore, GROOT leads to improvements of the overall decoder distribution as evidenced by the quality gains of the top-k candidates.

1. INTRODUCTION

While this e-commerce pipeline could utilize the model's predictions in different ways, a likely use is in retrieving candidates that match the predicted annotations. However, with models being imperfect, even well-trained models may make errors like:



Figure 1: Results for our model (GROOT) vs the NLL baseline demonstrating the precipitous drop-off in quality of NLL model predictions outside the top-1. Sequential labeling tasks are ubiquitous among NLP applications. Tasks ranging from syntactic analysis (e.g., POS tagging and phrase chunking) to semantic analysis (e.g., named entity recognition, slot filling, and query segmentation), are critical components in end-to-end applications, such as search engines and goaloriented dialog systems. Advances in pretraining of generative language models (LMs) like T5 (Raffel et al., 2020) and mT5 (Xue et al., 2021) have enabled us to use the same training strategy seamlessly across these diverse sequence labeling tasks. We can fine-tune a pretrained LM by maximizing the likelihood of generating the ground-truth (human annotated) labeled data.However, in practice, the metrics and constraints we may care about remain fairly disconnected from the standard Negative Log-Likelihood (NLL) objective used to train these models. To understand this better, consider an example of an entity recognition model within an e-commerce system. This model would be typically trained on data of the following form:Input: black & decker blender under 100 Label:[BRAND black & decker] [PRODUCT blender] [PRICE under 100]

