PRECONDITION LAYER AND ITS USE FOR GANS

Abstract

One of the major challenges when training generative adversarial nets (GANs) is instability. To address this instability spectral normalization (SN) is remarkably successful. However, SN-GAN still suffers from training instabilities, especially when working with higher-dimensional data. We find that those instabilities are accompanied by large condition numbers of the discriminator weight matrices. To improve training stability we study common linear-algebra practice and employ preconditioning. Specifically, we introduce a preconditioning layer (PC-layer) that performs a low-degree polynomial preconditioning. We use this PC-layer in two ways: 1) fixed preconditioning (FPC) adds a fixed PC-layer to all layers; and 2) adaptive preconditioning (APC) adaptively controls the strength of preconditioning. Empirically, we show that FPC and APC stabilize training of unconditional GANs using classical architectures. On LSUN 256 ⇥ 256 data, APC improves FID scores by around 5 points over baselines.

1. INTRODUCTION

Generative Adversarial Nets (GANs) (Goodfellow et al., 2014) successfully transform samples from one distribution to another. Nevertheless, training GANs is known to be challenging, and its performance is often sensitive to hyper-parameters and datasets. Understanding the training difficulties of GAN is thus an important problem. Recent studies in neural network theory (Pennington et al., 2017; Xiao et al., 2018; 2020) suggest that the spectrum of the input-output Jacobian or neural tangent kernel (NTK) is an important metric for understanding training performance. While directly manipulating the spectrum of the Jacobian or NTK is not easy, a practical approach is to manipulate the spectrum of weight matrices, such as orthogonal initialization (Xiao et al., 2018) . For a special neural net, Hu et al. (2020) showed that orthogonal initialization leads to better convergence result than Gaussian initialization, which provides early theoretical evidence for the importance of manipulating the weight matrix spectrum. Motivated by these studies, we suspect that an 'adequate' weight matrix spectrum is also important for GAN training. Indeed, one of the most popular techniques for GAN training, spectral normalization (SN) (Miyato et al., 2018) , manipulates the spectrum by scaling all singular values by a constant. This ensures the spectral norm is upper bounded. However, we find that for some hyperparameters and for high-resolution datasets, SN-GAN fails to generate good images. In a study we find the condition numbers of weight matrices to become very large and the majority of the singular values are close to 0 during training. See Fig. 1 (a) and Fig. 2(a) . This can happen as SN does not promote a small condition number. This finding motivates to reduce the condition number of weights during GAN training. Recall that controlling the condition number is also a central problem in numerical linear algebra, known as preconditioning (see Chen ( 2005)). We hence seek to develop a "plug-in" preconditioner for weights. This requires the preconditioner to be differentiable. Out of various preconditioners, we find the polynomial preconditioner to be a suitable choice due to the simple differentiation and strong theoretical support from approximation theory. Further, we suggest to adaptively adjust the strength of the preconditioner during training so as to not overly restrict the expressivity. We show the efficacy of preconditioning on CIFAR10 (32 ⇥ 32), STL (48 ⇥ 48) and LSUN bedroom, tower and living room (256 ⇥ 256). Summary of contributions. For a deep linear network studied in (Hu et al., 2020) , we prove that if all weight matrices have bounded spectrum, then gradient descent converges to global min-

