MIROSTAT: A NEURAL TEXT DECODING ALGORITHM THAT DIRECTLY CONTROLS PERPLEXITY

Abstract

Neural text decoding algorithms strongly influence the quality of texts generated using language models, but popular algorithms like top-k, top-p (nucleus), and temperature-based sampling may yield texts that have objectionable repetition or incoherence. Although these methods generate high-quality text after ad hoc parameter tuning that depends on the language model and the length of generated text, not much is known about the control they provide over the statistics of the output. This is important, however, since recent reports show that humans prefer when perplexity is neither too much nor too little and since we experimentally show that cross-entropy (log of perplexity) has a near-linear relation with repetition. First we provide a theoretical analysis of perplexity in top-k, top-p, and temperature sampling, under Zipfian statistics. Then, we use this analysis to design a feedback-based adaptive top-k text decoding algorithm called mirostat that generates text (of any length) with a predetermined target value of perplexity without any tuning. Experiments show that for low values of k and p, perplexity drops significantly with generated text length and leads to excessive repetitions (the boredom trap). Contrarily, for large values of k and p, perplexity increases with generated text length and leads to incoherence (confusion trap). Mirostat avoids both traps. Specifically, we show that setting target perplexity value beyond a threshold yields negligible sentence-level repetitions. Experiments with human raters for fluency, coherence, and quality further verify our findings.

1. INTRODUCTION

Large-scale generative language models (LMs) have received recent attention due to their highquality open-ended text generation ability (Brown et al., 2020; Radford et al., 2019) . Generating texts from these LMs usually relies on some form of random sampling. Pure sampling often leads to incoherent and low-quality texts (Holtzman et al., 2018) , whereas greedy decoding leads to excessive repetitions, another form of low quality. The right decoding algorithm is needed to generate highquality texts with controlled attributes (Ippolito et al., 2020; Zhang et al., 2020; Ippolito et al., 2019) . We introduce mirostat, 1 a neural text decoding algorithm that actively controls the generative process to maintain the perplexity of generated text at a certain desired value. Mirostat uses an adaptive topk sampling algorithm to actively tune the value of k which helps maintain the overall perplexity of the text; recall that top-k sampling (Holtzman et al., 2018; Fan et al., 2018) is where the next word is sampled from the top k most probable choices. Top-k sampling and several other recent sampling methods are motivated by suppressing an unreliable tail in the probability distribution of trained LMs. Another sampling method is top-p, also known as nucleus sampling, where the next word is chosen from the top x probable choices, where 1 The word mirostat is derived from mirum which is Latin for surprise and stat meaning control. This work was funded in part by the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR), a research collaboration as part of the IBM AI Horizons Network and the National Science Foundation Grant CCF-1717530. 1

