ON THE INVERSION OF DEEP GENERATIVE MODELS

Abstract

Deep generative models (e.g. GANs and VAEs) have been developed quite extensively in recent years. Lately, there has been an increased interest in the inversion of such a model, i.e. given a (possibly corrupted) signal, we wish to recover the latent vector that generated it. Building upon sparse representation theory, we define conditions that rely only on the cardinalities of the hidden layer and are applicable to any inversion algorithm (gradient descent, deep encoder, etc.), under which such generative models are invertible with a unique solution. Importantly, the proposed analysis is applicable to any trained model, and does not depend on Gaussian i.i.d. weights. Furthermore, we introduce two layer-wise inversion pursuit algorithms for trained generative networks of arbitrary depth, where one of them is accompanied by recovery guarantees. Finally, we validate our theoretical results numerically and show that our method outperforms gradient descent when inverting such generators, both for clean and corrupted signals.

1. INTRODUCTION

In the past several years, deep generative models, e.g. Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) and Variational Auto-Encoders (VAEs) (Kingma & Welling, 2013) , have been greatly developed, leading to networks that can generate images, videos, and speech voices among others, that look and sound authentic to humans. Loosely speaking, these models learn a mapping from a random low-dimensional latent space to the training data distribution, obtained in an unsupervised manner. Interestingly, deep generative models are not used only to generate arbitrary signals. Recent work rely on the inversion of these models to perform visual manipulations, compressed sensing, image interpolation, and more (Zhu et al., 2016; Bora et al., 2017; Simon & Aberdam, 2020) . In this work, we study this inversion task. Formally, denoting the signal to invert by y ∈ R n , the generative model as G : R n0 → R n , and the latent vector as z ∈ R n0 , we study the following problem: z * = arg min z 1 2 G(z) -y 2 2 , ( ) where G is assumed to be a feed-forward neural network. The first question that comes to mind is whether this model is invertible, or equivalently, does Equation 1 have a unique solution? In this work, we establish theoretical conditions that guarantee the invertibility of the model G. Notably, the provided theorems are applicable to general non-random generative models, and do not depend on the chosen inversion algorithm. Once the existence of a unique solution is recognized, the next challenge is to provide a recovery algorithm that is guaranteed to obtain the sought solution. A common and simple approach is to draw a random vector z and iteratively update it using gradient descent, opting to minimize Equation 1 (Zhu et al., 2016; Bora et al., 2017) . Unfortunately, this approach has theoretical guarantees only in limited scenarios (Hand et al., 2018; Hand & Voroninski, 2019) , since the inversion problem is generally non-convex. An alternative approach is to train an encoding neural network that maps images to their latent vectors (Zhu et al., 2016; Donahue et al., 2016; Bau et al., 2019; Simon & Aberdam, 2020) ; however, this method is not accompanied by any theoretical justification. We adopt a third approach in which the generative model is inverted in an analytical fashion. Specifically, we perform the inversion layer-by-layer, similar to Lei et al. (2019) . Our approach is based on the observation that every hidden layer is an outcome of a weight matrix multiplying a sparse vector, followed by a ReLU activation. By utilizing sparse representation theory, the proposed algorithm ensures perfect recovery in the noiseless case and bounded estimation error in the noisy one. Moreover, we show numerically that our algorithm outperforms gradient descent in several tasks, including reconstruction of noiseless and corrupted images. Main contributions: The contributions of this work are both theoretical and practical. We derive theoretical conditions for the invertiblity of deep generative models by ensuring a unique solution for the inversion problem defined in Equation 1. In short, these conditions rely on the growth of the non-zero elements of consecutive hidden layers by a factor of 2 for trained networks and by any constant greater than 1 for random models. Then, by leveraging the inherent sparsity of the hidden layers, we introduce a layerwise inversion algorithm with provable guarantees in the noiseless and noisy settings for fully-connected generators. To the best of our knowledge, this is the first work that provides such guarantees for general (non-random) models, addressing both the conceptual inversion and provable algorithms for solving Equation 1. Finally, we provide numerical experiments, demonstrating the superiority of our approach over gradient descent in various scenarios. 1 

Layered-wise inversion:

The closest work to ours, and indeed its source of inspiration, is Lei et al. ( 2019), which proposes a novel scheme for inverting generative models. By assuming that the input signal was corrupted by bounded noise in terms of 1 or ∞ , they suggest inverting the model using linear programs layer-by-layer. That said, to assure a stable inversion, their analysis is restricted to cases where: (i) the network weights are Gaussian i.i.d. variables; (ii) the layers expand such that the number of non-zero elements in each layer is larger than the size of the entire layer preceding it; and (iii) that the last activation function is either ReLU or leaky-ReLU. Unfortunately, as mentioned in their work, these three assumptions often do not hold in practice. In this work, we do not rely on the distribution of the weights nor on the chosen activation function of the last layer. Furthermore, we relax the expansion assumption as to rely only on the expansion of the number of non-zero elements. This relaxation is especially needed in the last hidden layer, which is typically larger than the image size. Neural networks and sparse representation: In the search for a profound theoretical understanding for deep learning, a series of papers suggested a connection between neural networks and sparse coding, by demonstrating that the forward pass of a neural network is in fact a pursuit for a multilayer sparse representation (Papyan et al., 2017; Sulam et al., 2018; Chun & Fessler, 2019; Sulam et al., 2019; Romano et al., 2019; Xin et al., 2016) . In this work, we expand this proposition by showing that the inversion of a generative model is based on sequential sparse coding steps.

2. THE GENERATIVE MODEL

Notations: We use bold uppercase letters to represent matrices, and bold lowercase letters to represent vectors. The vector w j represents the jth column in the matrix W. Similarly, the vector w i,j represents the jth column in the matrix W i . The activation function ReLU is the entry-wise operator ReLU(u) = max{u, 0}. We denote by spark(W) the smallest number of columns in W that are linearly-dependent, and by x 0 the number of non-zero elements in x. The mutual

