OUT-OF-DISTRIBUTION DETECTION AND SELECTIVE GENERATION FOR CONDITIONAL LANGUAGE MODELS

Abstract

Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that highperforming ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the next token in an output sequence, and may suffer even worse degradation on OOD inputs as the prediction is done auto-regressively over many steps. Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. We also show how our method can be used under the common and realistic setting of distribution shift for selective generation (analogous to selective prediction for classification) of high-quality outputs, while automatically abstaining from low-quality ones, enabling safer deployment of generative language models.

1. INTRODUCTION

Recent progress in generative language models (Wu et al., 2016a; Radford et al., 2019; Lewis et al., 2020; Raffel et al., 2020; Zhang et al., 2020) has led to quality approaching human-performance on research datasets and has opened up the possibility of their wide deployment beyond the academic setting. In realistic user-facing scenarios such as text summarization and translation, it should be expected that user provided inputs can significantly deviate from the training data distribution. This violates the independent, identically-distributed (IID) assumption commonly used in evaluating machine learning models. Many have shown that performance of machine learning models can degrade significantly and in surprising ways on OOD inputs (Nguyen et al., 2014; Goodfellow et al., 2014; Ovadia et al., 2019) . For example an image classifier may detect cows in images with very high accuracy on its IID test set but confidently fails to detect a cow when paired with an unseen background (Murphy, 2023; Nagarajan et al., 2020) . This has led to active research on OOD detection for a variety of domains, including vision and text but focused primarily on classification. Salehi et al. ( 2021 Conditional language models are typically trained given input sequence x = x 1 . . . x L to autoregressively generate the next token in a sequence y = y 1 . . . y T as a classification over the token-vocabulary V , p θ (y|x) = T t=1 p θ (y t |y <t , x), y t ∈ V . Consequently, the perils of out-ofdistribution are arguably more severe as (a) errors propagate and magnify through auto-regression, and (b) the space of low-quality outputs is greatly increased as arbitrary text sequences can be generated. Common errors from text generation models include disfluencies (Holtzman et al., 2020) and factual inaccuracies (Goodrich et al., 2019; Maynez et al., 2020) . A common failure case we observed in abstractive summarization is for the model to output "All images are copyrighted" as the summary for news articles from a publisher (CNN) different than what it was trained on (BBC) (see Figure A.7) . In this work, we propose OOD detection methods for CLMs using abstractive summarization and translation as case studies. Similar to classification, we show in Section 2.1 that CLMs have untrustworthy likelihood estimation on OOD examples, making perplexity a poor choice for OOD



); Bulusu et al. (2020); Ruff et al. (2021) provide comprehensive reviews on this topic.

