DIFFUSER: DIFFUSION VIA EDIT-BASED RECON-STRUCTION

Abstract

In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm. Despite being performant, these models lack the ability to revise existing text, which limits their usability in many practical scenarios. We look to address this, with DIFFUSER (Diffusion via Edit-based Reconstruction), a new edit-based generative model for text based on denoising diffusion models -a class of models that use a Markov chain of denoising steps to incrementally generate data. DIFFUSER is not only a strong generative model in general, rivalling autoregressive models on several tasks spanning machine translation, summarization, and style transfer; it can also perform other varieties of generation that standard autoregressive models are not well-suited for. For instance, we demonstrate that DIFFUSER makes it possible for a user to condition generation on a prototype, or an incomplete sequence, and continue revising based on previous edit steps.

1. INTRODUCTION

Revision and editing are central to how humans produce content; we write and revise emails and papers, gradually produce works of art, and iterate on plans for a project. Despite this, the most dominant paradigm in text generation is purely autoregressive, producing text left-to-right in a single pass (Bengio et al., 2003) . Although models employing this single-pass form of generation are highly performant, they are limited by the inability to refine existing text. To address this, we propose DIFFUSER: Diffusion via Edit-based Reconstruction, a flexible method to apply edit-based generative processes to arbitrary text generation tasks. Specifically, we take inspiration from diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) , generative models that generate by way of incremental denoising steps, and adapt this approach to the text generation paradigm with a formulation similar to natural editing processes. Prior work on text generation either focuses on improving the performance of standard autoregressive (AR) models through larger models and datasets (Vaswani et al., 2017; Sutskever et al., 2014; Radford et al.; Brown et al., 2020) or on proposing new, non-autoregressive approaches (Gu et al., 2017; Ghazvininejad et al., 2019; Gu et al., 2019) to improve general modes of text generation. A thus far separate line of models has taken the perspective of modeling text edits for specific tasks: e.g. style transfer (Reid & Zhong, 2021; Malmi et al., 2020) , sentence fusion (Malmi et al., 2019) , and grammatical error correction (Dale & Kilgarriff, 2011) . DIFFUSER unifies these two perspectives by enabling edit processes to be applied to general purpose text generation without compromising performance or requiring external supervised data (Guu et al., 2018) . This design enables it

