HOW (UN)FAIR IS TEXT SUMMARIZATION?

Abstract

Creating a good summary requires carefully choosing details from the original text to accurately represent it in a limited space. If a summary contains biased information about a group, it risks passing this bias off to readers as fact. These risks increase if we consider not just one biased summary, but rather a biased summarization algorithm. Despite this, no work has measured whether these summarizers demonstrate biased performance. Rather, most work in summarization focuses on improving performance, ignoring questions of bias. In this paper, we demonstrate that automatic summarizers both amplify and introduce bias towards information about under-represented groups. Additionally, we show that summarizers are highly sensitive to document structure, making the summaries they generate unstable under changes that are semantically meaningless to humans, which poses a further fairness risk. Given these results, and the large-scale potential for harm presented by biased summarization, we recommend that bias analysis be performed and reported on summarizers to ensure that new automatic summarization methods do not introduce bias to the summaries they generate.

1. INTRODUCTION

In any piece of text, bias against a group may be expressed. This bias may be explicit or implicit and can be displayed either in what information is included (e.g., including information that is exclusively negative about one group and exclusively positive about another), where in the article it comes from (e.g., only selecting sentences from the start of articles), or how it is written (e.g., saying "a man thought to be involved in crime died last night after an officer involved shooting" vs."a police officer shot and killed an unarmed man in his home last night"). The presence of any bias in a longer text may be made worse by summarizing it. A summary can be seen as a presentation of the most salient points of a larger piece of text, where the definition of "salient information" will vary according to various ideologies a person holds. Due to this subjectivity and the space constraints a summary imposes, there is a heightened potential for summaries to contain bias. Readers, however, expect that summaries faithfully represent articles. Therefore, bias in the text of a summary is likely to go unquestioned. If a summary presents information in a way biased against a group, readers are likely to believe that the article exhibited this same bias, as checking the truth of these assumptions requires a high amount of effort. This poses several risks. First, an echo chamber effect, where the bias in generated summaries agrees with biases the reader already has. The opposite is also a risk. An article may present more or less the same amounts of information about multiple groups whereas its summary includes more information about one group, leading readers to believe the most important information in the article is about only one group. As writing summaries manually carries a large cost, automatic summarization is an appealing solution. However, where one biased summary is a problem, a biased summarization algorithm capable of summarizing thousands of articles in the time it takes a human to generate one, is a disaster. In recent years, automatic summarization has increased in availability, both for personal and commercial use. Summarization algorithms have suggested for use on news articles, medical notes, business documents, legal texts, personal documentsfoot_0 , and conversation transcriptsfoot_1 . Despite the sensitivity of these applications, to the best of our knowledge, no work has measured the bias towards groups of summaries generated by common summarization algorithms.



https://ai.googleblog.com/2022/03/auto-generated-summaries-in-google-docs.html?m=1 https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/summarization/overview 1

