IDF++: ANALYZING AND IMPROVING INTEGER DISCRETE FLOWS FOR LOSSLESS COMPRESSION

Abstract

In this paper we analyse and improve integer discrete flows for lossless compression. Integer discrete flows are a recently proposed class of models that learn invertible transformations for integer-valued random variables. Their discrete nature makes them particularly suitable for lossless compression with entropy coding schemes. We start by investigating a recent theoretical claim that states that invertible flows for discrete random variables are less flexible than their continuous counterparts. We demonstrate with a proof that this claim does not hold for integer discrete flows due to the embedding of data with finite support into the countably infinite integer lattice. Furthermore, we zoom in on the effect of gradient bias due to the straight-through estimator in integer discrete flows, and demonstrate that its influence is highly dependent on architecture choices and less prominent than previously thought. Finally, we show how different architecture modifications improve the performance of this model class for lossless compression, and that they also enable more efficient compression: a model with half the number of flow layers performs on par with or better than the original integer discrete flow model.

1. INTRODUCTION

Density estimation algorithms that minimize the cross entropy between a data distribution and a model distribution can be interpreted as lossless compression algorithms because the cross-entropy upper bounds the data entropy. While autoregressive neural networks (Uria et al., 2014; Theis & Bethge, 2015; Oord et al., 2016; Salimans et al., 2017) and variational auto-encoders (Kingma & Welling, 2013; Rezende & Mohamed, 2015) have seen practical connections to lossless compression for some time, normalizing flows were only recently used for lossless compression. Most normalizing flow models are designed for real-valued data, which complicates an efficient connection with entropy coders for lossless compression since entropy coders require discretized data. However, normalizing flows for real-valued data were recently connected to bits-back coding by Ho et al. In this paper we aim to improve integer discrete flows for lossless compression. Recent literature has proposed several hypotheses on the weaknesses of this model class, which we investigate as potential directions for improving compression performance. More specifically, we start by discussing the claim on the flexibility of normalizing flows for discrete random variables by Papamakarios et al. ( 2019), and we show that this limitation on flexibility does not apply to integer discrete flows. We then continue by discussing the potential influence of gradient bias on the training of integer discrete flows. We demonstrate that other less-biased gradient estimators do not improve final results. Furthermore, through a numerical analysis on a toy example we show that the straight-through gradient estimates for 8-bit data correlate well with finite difference estimates of the gradient. We also demonstrate that the previously observed performance degradation as a function of number of flows is highly dependent on the architecture of the coupling layers. Motivated by this last finding, we introduce several architecture changes that improve the performance of this model class on lossless image compression.



(2019b), opening up the possibility for efficient dataset compression with high compression rates. Orthogonal to this, Tran et al. (2019) and Hoogeboom et al. (2019a) introduced normalizing flows for discrete random variables. Hoogeboom et al. (2019a) demonstrated that integer discrete flows can be connected directly to entropy coders without the need for bits-back coding.

