IMPROVED AUTOREGRESSIVE MODELING WITH DISTRIBUTION SMOOTHING

Abstract

While autoregressive models excel at image compression, their sample quality is often lacking. Although not realistic, generated images often have high likelihood according to the model, resembling the case of adversarial examples. Inspired by a successful adversarial defense method, we incorporate randomized smoothing into autoregressive generative modeling. We first model a smoothed version of the data distribution, and then reverse the smoothing process to recover the original data distribution. This procedure drastically improves the sample quality of existing autoregressive models on several synthetic and real-world image datasets while obtaining competitive likelihoods on synthetic datasets.

1. INTRODUCTION

Autoregressive models have exhibited promising results in a variety of downstream tasks. For instance, they have shown success in compressing images (Minnen et al., 2018) , synthesizing speech (Oord et al., 2016a) and modeling complex decision rules in games (Vinyals et al., 2019) . However, the sample quality of autoregressive models on real-world image datasets is still lacking. Poor sample quality might be explained by the manifold hypothesis: many real world data distributions (e.g. natural images) lie in the vicinity of a low-dimensional manifold (Belkin & Niyogi, 2003) , leading to complicated densities with sharp transitions (i.e. high Lipschitz constants), which are known to be difficult to model for density models such as normalizing flows (Cornish et al., 2019) . Since each conditional of an autoregressive model is a 1-dimensional normalizing flow (given a fixed context of previous pixels), a high Lipschitz constant will likely hinder learning of autoregressive models. Another reason for poor sample quality is the "compounding error" issue in autoregressive modeling. To see this, we note that an autoregressive model relies on the previously generated context to make a prediction; once a mistake is made, the model is likely to make another mistake which compounds (Kääriäinen, 2006) , eventually resulting in questionable and unrealistic samples. Intuitively, one would expect the model to assign low-likelihoods to such unrealistic images, however, this is not always the case. In fact, the generated samples, although appearing unrealistic, often are assigned high-likelihoods by the autoregressive model, resembling an "adversarial example" (Szegedy et al., 2013; Biggio et al., 2013) , an input that causes the model to output an incorrect answer with high confidence. Inspired by the recent success of randomized smoothing techniques in adversarial defense (Cohen et al., 2019) , we propose to apply randomized smoothing to autoregressive generative modeling. More specifically, we propose to address a density estimation problem via a two-stage process. Unlike Cohen et al. (2019) which applies smoothing to the model to make it more robust, we apply smoothing to the data distribution. Specifically, we convolve a symmetric and stationary noise distribution with the data distribution to obtain a new "smoother" distribution. In the first stage, we model the smoothed version of the data distribution using an autoregressive model. In the second stage, we reverse the smoothing process-a procedure which can also be understood as "denoising"-by either applying a gradient-based denoising approach (Alain & Bengio, 2014) or introducing another conditional autoregressive model to recover the original data distribution from the smoothed one. By choosing an appropriate smoothing distribution, we aim to make each step easier than the original learning problem: smoothing facilitates learning in the first stage by making the input distribution

