REDUCING THE COMPUTATIONAL COST OF DEEP GENERATIVE MODELS WITH BINARY NEURAL NETWORKS

Abstract

Deep generative models provide a powerful set of tools to understand real-world data. But as these models improve, they increase in size and complexity, so their computational cost in memory and execution time grows. Using binary weights in neural networks is one method which has shown promise in reducing this cost. However, whether binary neural networks can be used in generative models is an open problem. In this work we show, for the first time, that we can successfully train generative models which utilize binary neural networks. This reduces the computational cost of the models massively. We develop a new class of binary weight normalization, and provide insights for architecture designs of these binarized generative models. We demonstrate that two state-of-the-art deep generative models, the ResNet VAE and Flow++ models, can be binarized effectively using these techniques. We train binary models that achieve loss values close to those of the regular models but are 90%-94% smaller in size, and also allow significant speed-ups in execution time.

1. INTRODUCTION

As machine learning models continue to grow in number of parameters, there is a corresponding effort to try and reduce the ever-increasing memory and computational requirements that these models incur. One method to make models more efficient is to use neural networks with weights and possibly activations restricted to be binary-valued (Courbariaux et al., 2015; Courbariaux et al., 2016; Rastegari et al., 2016; McDonnell, 2018; Gu et al., 2018) . Binary weights and activations require significantly less memory, and also admit faster low-level implementations of key operations such as linear transformations than when using the usual floating-point precision. Although the application of binary neural networks for classification is relatively well-studied, there has been no research that we are aware of that has examined whether binary neural networks can be used effectively in unsupervised learning problems. Indeed, many of the deep generative models that are popular for unsupervised learning do have high parameter counts and are computationally expensive (Vaswani et al., 2017; Maaløe et al., 2019; Ho et al., 2019a) . These models would stand to benefit significantly from converting the weights and activations to binary values, which we call binarization for brevity. In this work we focus on non-autoregressive models with explicit densities. One such class of density model is the variational autoencoder (VAE) (Kingma & Welling, 2014; Rezende et al., 2014) , a latent variable model which has been used to model many high-dimensional data domains accurately. The state-of-the-art VAE models tend to have deep hierarchies of latent layers, and have demonstrated good performance relative to comparable modelling approaches (Ranganath et al., 2016; Kingma et al., 2016; Maaløe et al., 2019) . Whilst this deep hierarchy makes the model powerful, the model size and compute requirements increases with the number of latent layers, making very deep models resource intensive. Another class of density model which has shown promising results are flow-based generative models (Dinh et al., 2014; Rezende & Mohamed, 2015; Dinh et al., 2017) . These models perform a series

