ANYTIME SAMPLING FOR AUTOREGRESSIVE MODELS VIA ORDERED AUTOENCODING

Abstract

Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models, however, does not allow interruptions and cannot adapt to real-time computational resources. This challenge impedes the deployment of powerful autoregressive models, which involve a slow sampling process that is sequential in nature and typically scales linearly with respect to the data dimension. To address this difficulty, we propose a new family of autoregressive models that enables anytime sampling. Inspired by Principal Component Analysis, we learn a structured representation space where dimensions are ordered based on their importance with respect to reconstruction. Using an autoregressive model in this latent space, we trade off sample quality for computational efficiency by truncating the generation process before decoding into the original data space. Experimentally, we demonstrate in several image and audio generation tasks that sample quality degrades gracefully as we reduce the computational budget for sampling. The approach suffers almost no loss in sample quality (measured by FID) using only 60% to 80% of all latent dimensions for image data. Code is available at https://github.com/Newbeeer/Anytime-Auto-Regressive-Model.

1. INTRODUCTION

Autoregressive models are a prominent approach to data generation, and have been widely used to produce high quality samples of images (Oord et al., 2016b; Salimans et al., 2017; Menick & Kalchbrenner, 2018 ), audio (Oord et al., 2016a ), video (Kalchbrenner et al., 2017) and text (Kalchbrenner et al., 2016; Radford et al., 2019) . These models represent a joint distribution as a product of (simpler) conditionals, and sampling requires iterating over all these conditional distributions in a certain order. Due to the sequential nature of this process, the computational cost will grow at least linearly with respect to the number of conditional distributions, which is typically equal to the data dimension. As a result, the sampling process of autoregressive models can be slow and does not allow interruptions. Although caching techniques have been developed to speed up generation (Ramachandran et al., 2017; Guo et al., 2017) , the high cost of sampling limits their applicability in many scenarios. For example, when running on multiple devices with different computational resources, we may wish to trade off sample quality for faster generation based on the computing power available on each device. Currently, a separate model must be trained for each device (i.e., computational budget) in order to

