ON NOISE INJECTION IN GENERATIVE ADVERSARIAL NETWORKS

Abstract

Noise injection is an effective way of circumventing overfitting and enhancing generalization in machine learning, the rationale of which has been validated in deep learning as well. Recently, noise injection exhibits surprising effectiveness when generating high-fidelity images in Generative Adversarial Networks (e.g. StyleGAN). Despite its successful applications in GANs, the mechanism of its validity is still unclear. In this paper, we propose a geometric framework to theoretically analyze the role of noise injection in GANs. Based on Riemannian geometry, we successfully model the noise injection framework as fuzzy equivalence on geodesic normal coordinates. Guided by our theories, we find that existing methods are incomplete and a new strategy for noise injection is devised. Experiments on image generation and GAN inversion demonstrate the superiority of our method.

1. INTRODUCTION

Noise injection is usually applied as regularization to cope with overfitting or facilitate generalization in neural networks (Bishop, 1995; An, 1996) . The effectiveness of this simple technique has also been proved in various tasks in deep learning, such as learning deep architectures (Hinton et al., 2012; Srivastava et al., 2014; Noh et al., 2017) , defending adversarial attacks (He et al., 2019) , facilitating stability of differentiable architecture search with reinforcement learning (Liu et al., 2019; Chu et al., 2020) , and quantizing neural networks (Baskin et al., 2018) . In recent years, noise injectionfoot_0 has attracted more and more attention in the community of Generative Adversarial Networks (GANs) (Goodfellow et al., 2014a) . Extensive research shows that it helps stabilize the training procedure (Arjovsky & Bottou, 2017; Jenni & Favaro, 2019) and generate images of high fidelity (Karras et al., 2019a; b; Brock et al., 2018) . In practice, Fig. 1 shows significant improvement in hair quality due to noise injection. Particularly, noise injection in StyleGAN (Karras et al., 2019a; b) has shown the amazing capability of helping generate sharp details in images, shedding new light on obtaining high-quality photo-realistic results using GANs. Therefore, studying the underlying principle of noise injection in GANs is an important theoretical work of understanding GAN algorithms. In this paper, we propose a theoretical framework to explain and improve the effectiveness of noise injection in GANs. Our framework is motivated from a geometric perspective and also combined with the results of optimal transportation problem in GANs (Lei et al., 2019a; b) . Our contributions are listed as follows: • We show that the existing GAN architectures, including Wasserstein GANs (Arjovsky et al., 2017) , may suffer from adversarial dimension trap, which severely penalizes the property of generator; • Based on our theory, we attempt to explain the properties that noise injection is applied in the related literatures; • Based on our theory, we propose a more proper form for noise injection in GANs, which can overcome the adversarial dimension trap. Experiments on the state-of-the-art GAN architecture, StyleGAN2 (Karras et al., 2019b) , demonstrate the superiority of our new method compared with original noise injection used in StyleGAN2. -→ Increasing noise injection depth Standard Dev Figure 1 : Noise injection significantly improves the detail quality of generated images. From left to right, we inject extra noise to the generator layer by layer. We can see that hair quality is clearly improved. Varying the injected noise and visualizing the standard deviation over 100 different seeds, we can find that the detail information such as hair, parts of background, and silhouettes are most involved, while the global information such as identity and pose is less affected. To the best of our knowledge, this is the first work that theoretically draws the geometric picture of noise injection in GANs.

2. RELATED WORKS

The main drawbacks of GANs are unstable training and mode collapse. Arjovsky et al. (Arjovsky & Bottou, 2017) theoretically analyze that noise injection directly to the image space can help smooth the distribution so as to stabilize the training procedure. The authors of Distribution-Filtering GAN (DFGAN) (Jenni & Favaro, 2019) then put this idea into practice and prove that this technique will not influence the global optimality of the real data distribution. However, as the authors pointed out in (Arjovsky & Bottou, 2017) , this method depends on the amount of noise. Actually, our method of noise injection is essentially different from these ones. Besides, they do not provide a theoretical vision of explaining the interactions between injected noises and features. BigGAN (Brock et al., 2018) splits input latent vectors into one chunk per layer and projects each chunk to the gains and biases of batch normalization in each layer. They claim that this design allows direct influence on features at different resolutions and levels of hierarchy. StyleGAN (Karras et al., 2019a) and StyleGAN2 (Karras et al., 2019b) adopt a slightly different view, where noise injection is introduced to enhance randomness for multi-scale stochastic variations. Different from the settings in BigGAN, they inject extra noise independent of latent inputs into different layers of the network without projection. Our theoretical analysis is mainly motivated by the success of noise injection used in StyleGAN (Karras et al., 2019a) . Our proposed framework reveals that noise injection in StyleGAN is a kind of fuzzy reparameterization in Euclidean spaces, and we extends it into generic manifolds (section 4.3).

3.1. OPTIMAL TRANSPORTATION AND DISCONTINUOUS GENERATOR

Traditional GANs with Wasserstein distance are equivalent to the optimal transportation problem, where the optimal generator is the optimal transportation map. However, there is rare chance for the optimal transportation map to be continuous, unless the support of Brenier potential is convex (Caffarelli, 1992) . Considering that the Brenier potential of Wasserstein GAN is determined by the real data distribution and the inverse map of the generator, it is highly unlikely that its support is convex. This means that the optimal generator will be discontinuous, which is a fatal limitation to the capacity of GANs. Based on that, Lei et al. (Lei et al., 2019a) further point out that traditional GANs will hardly converge or converge to one continuous branch of the target mapping, thus leading to mode collapse. They then propose to find the continuous Brenier potential instead of the discontinuous



It suffices to note that noise injection here is totally different from the research field of adversarial attacks raised inGoodfellow et al. (2014b).

