GENERATING SEQUENCES BY LEARNING TO [SELF-]CORRECT

Abstract

Sequence generation applications require satisfying semantic constraints, such as ensuring that programs are correct, using certain keywords, or avoiding undesirable content. Language models, whether fine-tuned or prompted with few-shot demonstrations, frequently violate these constraints, and lack a mechanism to iteratively revise their outputs. Moreover, some powerful language models are of extreme scale or inaccessible, making it inefficient, if not infeasible, to update their parameters for task-specific adaptation. We present SELF-CORRECTION, an approach that decouples an imperfect base generator (an off-the-shelf language model or supervised sequence-to-sequence model) from a separate corrector that learns to iteratively correct imperfect generations. To train the corrector, we propose an online training procedure that can use either scalar or natural language feedback on intermediate imperfect generations. We show that SELF-CORRECTION improves upon the base generator in three diverse generation tasksmathematical program synthesis, lexically-constrained generation, and toxicity control-even when the corrector is much smaller than the base generator.

1. INTRODUCTION

The standard practice for natural language generation tasks is inherently single-pass: applying a decoding procedure to either a few-shot prompted language model or one tuned for a given task, then considering the generation as "finished" (e.g. Radford et al. (2019) ; Brown et al. (2020) ; Chen et al. ( 2021)). Powerful generation models often meet most of the task requirements, yet miss a few (e.g., omitting a subset of keywords), or generate incorrect hypotheses that nevertheless provide useful structure (e.g., a correct problem solving strategy with a missing step). However, after generating even a slightly sub-optimal sequence, the single-pass paradigm requires models to "start from scratch", effectively discarding work already done. A more natural, intuitive approach is leveraging the generation as a useful starting point to refine into a higher quality output. To formalize this intuition, we introduce Self-Correction for Sequence Generation. Figure 1 demonstrates its central principle: a generation model is re-framed as a base generator, which produces a reasonable initial hypothesis but does not need to solve the task in one pass, and a second module-the corrector-trained to make up the difference between the hypothesis and an optimal solution. Neither the generator nor the corrector must solve the full task in one pass, and the corrector can be applied multiple times to iteratively improve the output ( §3.6). We propose a simple, general procedure for training the corrector (Figure 2 ) by pairing generator outputs with carefully selected targets. The result is a system which self-corrects, producing outputs through multiple generation passes and breaking the task into steps that can be solved by dedicated and efficient sub-systems. data and feedback, which applies generally to diverse tasks. A corrector model improves the base generator on 3 such tasks in our experiments: mathematical program synthesis ( §3.1), lexically constrained generation ( §3.2), and toxicity reduction ( §3.3). The trained corrector model even transfers to a larger generator with similar performance to training from scratch ( §3.4). Finally, we explore introducing a third module to the Self-Correction system ( §3.5)-explicitly using natural language feedback to guide corrections-with promising results. Self-Correction is an exciting path to build on the generations of strong models, with efficient, effective, and transferable corrector networks.

2. SELF-CORRECTING SEQUENCE GENERATORS

A typical autoregressive text generator (e.g. GPT-3 (Brown et al., 2020) ) maps an input prompt to a distribution over outputs using a single parameterized module (e.g. a large transformer), p 0 (y|x). We explore an alternative that decomposes into two modules, a base generator, and a corrector, p(y|x) = y0 p 0 (y 0 |x) generator p θ (y|y 0 , x) corrector (1) where the generator provides an initial hypothesis that is refined by the corrector. In practice, the corrector can be applied multiple times, p(y T |x) = y0 y1 • • • y T -1 p 0 (y 0 |x) t p θ (y t+1 |y t , x). Since a model of this form can both generate and correct its generations, we call it a Self-Corrector. Self-correctors have several unique properties compared to typical generators. First, a self-corrector decouples generation and correction, allowing us to freely parameterize each module -for instance, by prompting a single language model or using two different language models. In this paper, we develop a framework to train a separate corrector model ( §2.1). We find that the resulting selfcorrector improves upon the generator alone ( §3), even when the corrector is much smaller ( §3.4). Second, since the generator and the corrector are separated, we can keep the generator as a generalpurpose language model and train the corrector with different objectives for different task requirements. In §2.1, we propose a training algorithm for the corrector that is dedicated to improving generations, where the improvement can be in any aspect, measured by scalar values. Third, the corrector can receive explicit feedback about intermediate generations to guide subsequent generations. Formally, p(y|x) = y0 p 0 (y 0 |x)p θ (y|y 0 , x, f (y 0 )), where f is the feedback. The feedback can be of many forms, e.g. a sentence, a compiler trace, etc. In contrast, a typical generator that generates in a single pass does not leverage feedback on its own generation. In this paper, we show that the corrector can learn to exploit explicit natural language feedback to achieve better performance ( §3.5). Next, we describe our training framework of the corrector.

2.1. LEARNING A CORRECTOR

Our goal is to have the generator generate an initial hypothesis, then improve the hypothesis with the corrector (Eq. 1). We train the corrector to improve the quality of a hypothesis, while staying as close as possible to the original hypothesis. Here, quality is measured with a scalar value function v(y) which is accessible at training time (e.g. 0/1 indicator of program correctness, a toxicity score).



-Correction builds on past work for correction in the code and text (e.g. Yasunaga et al. (2021); Faltings et al. (2021)) domains, but provides a unified formalism with minimal assumptions about

Figure1: SELF-CORRECTORs decompose generation into a base generator that proposes an initial hypothesis, and a corrector that iteratively improves its quality.

