TEXT SUMMARIZATION WITH ORACLE EXPECTATION

Abstract

Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document. Since most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy, different labeling algorithms have been proposed to extrapolate oracle extracts for model training. In this work, we identify two flaws with the widely used greedy labeling approach: it delivers suboptimal and deterministic oracles. To alleviate both issues, we propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels. We define a new learning objective for extractive summarization which incorporates learning signals from multiple oracle summaries and prove it is equivalent to estimating the oracle expectation for each document sentence. Without any architectural modifications, the proposed labeling scheme achieves superior performance on a variety of summarization benchmarks across domains and languages, in both supervised and zero-shot settings. 1

1. INTRODUCTION

Summarization is the process of condensing a source text into a shorter version while preserving its information content. Thanks to neural encoder-decoder models (Bahdanau et al., 2015; Sutskever et al., 2014 ), Transformer-based architectures (Vaswani et al., 2017) , and large-scale pretraining (Liu & Lapata, 2019; Zhang et al., 2020a; Lewis et al., 2020) , the past few years have witnessed a huge leap forward in summarization technology. Abstractive methods fluently paraphrase the main content of the input, using a vocabulary different from the original document, while extractive approaches are less creative -they produce summaries by identifying and subsequently concatenating the most important sentences in a document -but manage to avoid hallucinations, false statements and inconsistencies. Neural extractive summarization is typically formulated as a sequence labeling problem (Cheng & Lapata, 2016), assuming access to (binary) labels indicating whether a document sentence should be in the summary. In contrast to the plethora of datasets (see Section 5 for examples) available for abstractive summarization (typically thousands of document-abstract pairs), there are no large-scale datasets with gold sentence labels for extractive summarization. Oracle labels are thus extrapolated from abstracts via heuristics, amongst which greedy search (Nallapati et al., 2017) is the most popular by far (Liu & Lapata, 2019; Xu et al., 2020; Dou et al., 2021; Jia et al., 2022) . In this work we challenge received wisdom and rethink whether greedy search is the best way to create sentence labels for extractive summarization. Specifically, we highlight two flaws with greedy labeling: (1) the search procedure is suboptimal, i.e., it does not guarantee a global optimum for the search objective, and (2) greedy oracles are deterministic, i.e., they yield a single reference extract for any given input by associating sentences in the document to its corresponding abstract. Perhaps an obvious solution to the suboptimality problem would be to look for oracle summaries following a procedure based on beam search. Although beam search finds better oracles, we empirically observe that summarization models trained on these do not consistently improve over greedy oracles, possibly due to the higher risk of under-fitting (Narayan et al., 2018a) -there are too few positive labels. Moreover, beam search would also create deterministic oracles. A summarization



Our code and models can be found at https://github.com/yumoxu/oreo. 1

