GLOBALLY INJECTIVE RELU NETWORKS

Abstract

Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fullyconnected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end-rather than layerwise-doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks.

1. INTRODUCTION

Many applications of deep neural networks require inverting them on their range. Given a neural network N : Z → X , where X is often the Euclidean space R m and Z is a lower-dimensional space, the map N -1 : N (Z) → Z is only well-defined when N is injective. The issue of injectivity is particularly salient in two applications: generative models and (nonlinear) inverse problems. Generative networks model a complicated distribution p X over X as a pushforward of a simple distribution p Z through N . Given an x in the range of N , inference requires computing p Z (N -1 (x)) which is well-posed only when N is injective. In the analysis of inverse problems (Arridge et al., 2019) , uniqueness of a solution is a key concern; it is tantamount to injectivity of the forward operator. Given a forward model that is known to yield uniqueness, a natural question is whether we can design a neural network that approximates it arbitrarily well while preserving uniqueness. Similarly, in compressed sensing with a generative prior N and a possibly nonlinear forward operator A injective on the range of N , we seek a latent code z such that A(N (z)) is close to some measured y = A(x). This is again only well-posed when N can be inverted on its range (Balestriero et al., 2020) . Beyond these motivations, injectivity is a fundamental mathematical property with numerous implications. We mention a notable example: certain injective generators can be trained with sample complexity that is polynomial in the image dimension (Bai et al., 2018) .

1.1. OUR RESULTS

In this paper we study injectivity of neural networks with ReLU activations. Our contributions can be divided into layerwise results and multilayer results. Layerwise results. For a ReLU layer f : R n → R m we derive sufficient and necessary conditions for invertibility on the range. For the first time, we construct deterministic injective ReLU layers with minimal expansivity m = 2n. We then derive specialized results for convolutional layers which are given in terms of filter kernels instead of weight matrices. We also prove upper and lower bounds on minimal expansivity of globally injective layers with iid Gaussian weights. This generalizes certain existing pointwise results (Theorem 2 and Appendix A.2). We finally derive the worst-case inverse Multilayer results. A natural question is whether injective models are sufficiently expressive. Using techniques from differential topology we prove that injective networks are universal in the following sense: if a neural network N 1 : Z → R 2n+1 models the data, Z ⊂ R n , then we can approximate N 1 by an injective neural network N 2 : Z → R 2n+1 . As N 2 is injective, the image set N 2 (Z) is a Lipschitz manifold. We then use an argument based on random projections to show that an end-to-end expansivity by a factor of ≈ 2 is enough for injectivity in ReLU networks, as opposed to layerwise 2-expansivity implied by the layerwise analysis. We conclude with preliminary numerical experiments to show that imposing injectivity improves inference in GANs while preserving expressivity.

1.2. WHY GLOBAL INJECTIVITY?

Figure 1 : An illustration of a ReLU layer N : R 2 → R 3 , x = N (z), that is not globally injective. Differently colored regions in the z-space are mapped to regions of the same color in the x-space. While N is locally injective in the pink, blue and green wedges in z-space, the orange, brown, and violet wedges are mapped to coordinate axes. N is thus not injective on these wedges. This prevents construction of an inverse in the range of N . The attribute "global" relates to global injectivity of the map N : Z → R m on the lowdimensional latent space Z, but it does not imply global invertibility over R m , only on the range N (Z) ⊂ R m . If we train a GAN generator to map iid normal latent vectors to real images from a given distribution, we expect that any sampled latent vector generates a plausible image. We thus desire that any N (z) be produced by a unique latent code z ∈ Z. This is equivalent to global injectivity, or invertibility on the range. Our results relate to the growing literature on using neural generative models for compressed sensing (Bora et al., 2017) . They parallel the related guarantees for sparse recovery where the role of the low-dimensional latent space is played by the set of all k-sparse vectors. One then looks for matrices which map all k-sparse vectors to distinct measurements (Foucart & Rauhut, 2013) . As an example, in the illustration in Figure 1 images coresponding to latent codes in orange, brown, and violet wedges cannot be compressively sensed. Finally, a neural network is often trained to directly reconstruct an image x from its (compressive) low-dimensional measurements y = A(x) without introducing any generative models. In this case, whenever A is Lipschitz, it is immediate that the learned inverse must be injective.

1.3. RELATED WORK

Closest to our work are the papers of Bruna et al. (2013 ), Hand et al. (2018) and Lei et al. (2019) . Bruna et al. (2013) study injectivity of pooling motivated by the problem of signal recovery from feature representations. They focus on p pooling layers; their Proposition 2.2 gives a criterion similar to the DSS (Definition 1) and bi-Lipschitz bounds for a ReLU layer (similar to our Theorem 3). Unlike Theorems 1 and 3, their criterion and Lipschitz bound are in some cases not precisely aligned with injectivity; see Appendix E.1. Compressed sensing with GAN priors requires inverting the generator on its range (Bora et al., 2017; Shah & Hegde, 2018; Wu et al., 2019; Mardani et al., 2018; Hand et al., 2018) . Lei et al. (2019) replace the end-to-end inversion by the faster and more accurate layerwise inversion when each layer is injective. They show that with high probability a ReLU layer with an iid normal weight matrix can be inverted about a fixed point if the layer expands at least by a factor of 2.1. This result is related to our Theorem 2 which gives conditions for global injectivity or layers with random matrices. Hand & Voroninski (2017) show that when the weights of a ReLU network obey a certain weighted distribution condition, the loss function for the inversion has a strict descent direction everywhere

