CTRLSUM: TOWARDS GENERIC CONTROLLABLE TEXT SUMMARIZATION

Abstract

Current summarization systems yield generic summaries that are disconnected from users' preferences and expectations. To address this limitation, we present CTRLsum, a novel framework for controllable summarization. Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts. Using a single unified model, CTRLsum is able to achieve a broad scope of summary manipulation at inference time without requiring additional human annotations or pre-defining a set of control aspects during training. We quantitatively demonstrate the effectiveness of our approach on three domains of summarization datasets and five control aspects: 1) entity-centric and 2) length-controllable summarization, 3) contribution summarization on scientific papers, 4) invention purpose summarization on patent filings, and 5) question-guided summarization on news articles in a reading comprehension setting. Moreover, when used in a standard, uncontrolled summarization setting, CTRLsum achieves state-of-the-art results on the CNN/DailyMail dataset.

1. INTRODUCTION

Neural summarization systems aim to compress a document into a short paragraph or sentence while preserving key information. There are largely two categories of summarization systems: extractive summarization that extracts important portions of a document (Cheng & Lapata, 2016; Nallapati et al., 2017; Narayan et al., 2018) , and abstractive summarization that freely generates novel sentences (Rush et al., 2015; See et al., 2017; Paulus et al., 2018) which can produce coherent and fluent summaries more flexibly. In this paper we focus on abstractive summarization. Typically abstractive summarization methods take a document as input and yield a generic summary to cover certain information identified by the model. However, content of interest is user-dependent. Summaries should select information with respect to preferences of a user. For example, Figure 1 shows an NBA basketball news article, and the reference summary describes several match results. However, fans of certain basketball stars in these teams such as Lebron James or Stephen Curry might only be interested in the matches they played and would like to know the player's scores as well. Motivated by this, we focus on controllable summarization which allows the users to manipulate the summaries from the model. We propose CTRLsum, a framework to control summaries through control tokens in the form of a set of keywords or descriptive prompts. At training time, the model learns to predict summaries conditioned on both the source document and keywords that serve as external guidance. During inference, keywords and optional prompts, which are the target prefix to constrain decoding, are combined as control tokens to convey user preferences as shown in Figure 1 . Keywords and prompts are complementary. Prompts do not perform well in many cases such as entity or length controlled summarization as our preliminary experiments imply, but keywords can achieve those goals in a flexible way, for example, by using entity as keywords or varying the number of keywords to control entities and length respectively. However, keywords struggle in more open-ended scenarios like summarizing a list of contributions of scientific papers, while constraining the decoding with prompt "the main contributions of this paper are:(1)" is possibly sufficient to achieve the goal. CTRLsum is trained using only keywords as additional input which can be easily identified from training summaries. It requires neither extra human annotations nor pre-defining control aspects for training, yet is quite flexible to achieve a broad scope of text manipulation as we will show in this paper. In contrast, prior work primarily rely on pre-defined "control codes" (Fan et al., 2018; Liu et al., 2018; Keskar et al., 2019) , thus need to collect annotations for training and cannot generalize to unseen control aspects easily at test time. We use pretrained BART (Lewis et al., 2019) as the underlying architecture and perform experiments on three datasets in three distinct domains: CNN/Dailymail news articles (Hermann et al., 2015) , arXiv scientific papers (Cohan et al., 2018) , and BIGPATENT patent documents (Sharma et al., 2019) . We quantitatively evaluate CTRLsum on five control aspects: (1) entity-centric ( §4.2) and (2) length-controllable summarization ( §4.3), (3) summarizing the contributions of scientific papers, (4) summarizing the purpose of an invention ( §4.4), and (5) summarizing answers to given questions in a zero-shot reading comprehension setting ( §4.5). Notably, our approach also achieves comparable or superior performance to the strong BART summarization model on all datasets in a standard, uncontrolled setting ( §4.6), leading to state-of-the-art results on the CNN/Dailymail dataset.

2. CTRLSUM

2.1 OVERVIEW Unconstrained neural summarization methods are trained to learn the conditional distribution p(y|x), where x and y represent the source document and summary respectively. The generated summaries depend solely on the document x without human involvement. To control the output summaries, we propose using additional control tokens z to represent user preferences and training a summarization model that predicts the conditional distribution p(y|x, z). The control tokens z include keywords as extra inputs during training and inference. They can also optionally include prompts at test time to further constrain the decoding process. As shown in Figure 1 , control tokens -in the form of keywords, prompts, or a combination of both -act as an interface between users and an otherwise black-box neural model, providing a flexible way for users to explicitly control automatic summarization. Next we describe how to obtain automatic keywords for training as well as potential applications at test time.

2.2. AUTOMATIC KEYWORD EXTRACTION

In addition to extracting keywords from training data to train the model, CTRLsum also features an automatic keywords extraction mechanism at test time, which can be used to suggest automatic



Code and model checkpoints will be public after the review period.



Figure 1: Workflow of the CTRLsum framework at inference time. Users interact with summaries through textual control tokens in the form of keywords or prompts. Keywords are required as input during training and testing, while prompts are optionally used at test time. Dashed lines represent optional paths -control tokens can come from the source article, user, or both. The right portion of the figure shows actual outputs from CTRLsum.

Dwyane Wade scored 21 of his 32 points in the first half and Goran Dragic added 20 as the Miami Heat handed LeBron James another loss on his former home floor with a 106-92 victory over the Cleveland Cavaliers on Monday …… [ignoring 60 tokens] James scored 16 of his 26 points in the fourth quarter for Cleveland, which had its four-game winning streak snapped. Kyrie Irving added 21. Klay Thompson scored 26 points, and Stephen Curry had 19 points and nine assists as the Golden State Warriors secured a playoff spot before beating the depleted Los Angeles Lakers 108-105 …… [ignoring 400 tokens]

