COCON: A SELF-SUPERVISED APPROACH FOR CONTROLLED TEXT GENERATION

Abstract

Pretrained Transformer-based language models (LMs) display remarkable natural language generation capabilities. With their immense potential, controlling text generation of such LMs is getting attention. While there are studies that seek to control high-level attributes (such as sentiment and topic) of generated text, there is still a lack of more precise control over its content at the word-and phrase-level. Here, we propose Content-Conditioner (CoCon) to control an LM's output text with a content input, at a fine-grained level. In our self-supervised approach, the CoCon block learns to help the LM complete a partially-observed text sequence by conditioning with content inputs that are withheld from the LM. Through experiments, we show that CoCon can naturally incorporate target content into generated texts and control high-level text attributes in a zero-shot manner.

1. INTRODUCTION

Transformer-based (Vaswani et al., 2017; Tay et al., 2020) pretrained language models (LMs) have led a wave of new advances in natural language processing tasks as a means to extract contextualized word embeddings (Devlin et al., 2018; Dai et al., 2019b; Yang et al., 2019) and as text generators (Radford et al., 2019; Brown et al., 2020) . These LMs are trained on huge amounts of text corpora to predict next tokens through a log-likelihood objective. Given its remarkably fluent text generation, there is growing interest in controlling output texts of such LMs (Keskar et al., 2019; Dathathri et al., 2019) . Approaches like training a modified LM from scratch to incorporate target text attributes (Keskar et al., 2019) can be expensive while finetuning pretrained LMs for specific attributes (Ziegler et al., 2019) limits the scope of text control. Without changing the architecture or weights of pretrained LMs, one promising approach (PPLM) (Dathathri et al., 2019) controls generated text through attribute models. Though effective in controlling high-level text attributes such as topic and sentiment, the same target attribute may generate text samples with vastly different content at the word-and phrase-levels, leaving a gap for more fine-grained control over the content of LM-generated texts. We conceptualize Content-Conditioner (CoCon) as an approach to narrow this gap by guiding pretrained LMs' text outputs through the incorporation of content input. This content input can take the form of a text sequence whose content we would like to condition on for text generation. Essentially, CoCon comprises two parts: 1) a pretrained LM and 2) a interleave CoCon layer. By employing a pretrained LM, CoCon incorporates the representations of a content input into the encoded text representations through the CoCon layer before passing the content-conditioned representations into LM β for generation. To train the CoCon block, we propose a self-supervised learning approach where training data consist of text samples generated by the pretrained LM itself ( § 3.1). By splitting each text sequence into two segments ([x a ; x b ]), CoCon learns through a self reconstruction objective to help the LM reconstruct missing latter segments (x b ) by taking x b itself as the content input. We use content masking for CoCon and also propose other loss functions such as cycle reconstruction to condition content from divergent sources while producing high-quality texts. Since the CoCon block's size is a small fraction of the LM and no finetuning is conducted on the LM's weights, the training cost is significantly lower than training an LM from scratch. We show that CoCon's fine-grained content control can be extended to also influence higher-level text attributes

