PENALIZING THE HIGH-LIKELIHOOD: A NOVEL SAM-PLING METHOD FOR OPEN-ENDED NEURAL TEXT GENERATION VIA INVERSE PROBABILITY WEIGHTING

Abstract

Traditional stochastic sampling methods for open-ended neural text generation focus on truncating the low-likelihood part of the predicted distribution. They do not directly manipulate the high-likelihood part, which leads to the likelihood trap that induces repetition and boredom. They also do not directly leverage that human does not always favor high-likelihood texts. Inspired by these, we propose a novel sampling method that rescales the high-likelihood part of the distribution with inverse probability weighting. It increases the diversity by rescaling and penalizing the high-likelihood words, and preserves the fluency by using multifiltering truncation on the low-likelihood words. We use pre-trained language models to compare our algorithm with traditional sampling methods. Results show that our algorithm can significantly increase the diversity and novelty of generated texts without corrupting the fluency.

1. INTRODUCTION

Open-ended neural text generation is greatly affected by decoding methods. Counter-intuitively, the quality-oriented decoding methods such as beam search, which maximizes the likelihood of decoded texts, induces the well-known text degeneration (Holtzman et al., 2020; Welleck et al., 2020) and likelihood trap (Zhang et al., 2021; Basu et al., 2021) , that is, the high-likelihood texts are prone to be repetitive and boring with low quality. As a result, many works have focused on stochastic sampling method such as top-k sampling (Fan et al., 2018; Holtzman et al., 2018) or nucleus sampling (top-p sampling, Holtzman et al., 2020) . These methods first truncate the low-likelihood part of the language model's predicted distribution, then perform stochastic sampling on the truncated distribution for all decoding time steps. Other methods, such as temperature sampling, rescale the log-likelihood of all words to control the quality of generated texts. Recent works (Caccia et al., 2020; Nadeem et al., 2020; Zhang et al., 2021) reveal that these methods achieve on-par performance regarding their quality-diversity trade-off feature. Still, there exist undiscovered properties to understand better the relationship between stochastic sampling algorithms and open-ended neural text generation (Nadeem et al., 2020) . We note that none of the traditional sampling algorithms have directly manipulated the high-likelihood part of the distribution since high-likelihood words are always considered to be "trustworthy". Essentially, the observed quality-likelihood curve by human judgment is inversely proportional to the likelihood in the high-likelihood area (Zhang et al., 2021) , which confirms the intuition that human does not always favor high-likelihood words (Holtzman et al., 2020; Welleck et al., 2020) . Inspired by these, we propose a novel sampling method, namely the interquartile range inverse probability (IQR-IP) sampling algorithm. It increases the diversity of generated texts by rescaling and penalizing the high-likelihood part of the predicted distribution with inverse probability weighting and preserves the fluency by using multi-filtering truncation on the low-likelihood. The rescaled distribution will achieve a closer resemblance to the quality-likelihood curve (such as the human judgment of Figure 1 by Zhang et al., 2021) , as is illustrated in Figure 1 . Empirical results show that our algorithm can increase the diversity and novelty of generated text without corrupting the fluency.

