GANS CAN PLAY LOTTERY TICKETS TOO

Abstract

Deep generative adversarial networks (GANs) have gained growing popularity in numerous scenarios, while usually suffer from high parameter complexities for resource-constrained real-world applications. However, the compression of GANs has less been explored. A few works show that heuristically applying compression techniques normally leads to unsatisfactory results, due to the notorious training instability of GANs. In parallel, the lottery ticket hypothesis shows prevailing success on discriminative models, in locating sparse matching subnetworks capable of training in isolation to full model performance. In this work, we for the first time study the existence of such trainable matching subnetworks in deep GANs. For a range of GANs, we certainly find matching subnetworks at 67%-74% sparsity. We observe that with or without pruning discriminator has a minor effect on the existence and quality of matching subnetworks, while the initialization weights used in the discriminator plays a significant role. We then show the powerful transferability of these subnetworks to unseen tasks. Furthermore, extensive experimental results demonstrate that our found subnetworks substantially outperform previous state-of-the-art GAN compression approaches in both image generation (e.g.

1. INTRODUCTION

Generative adversarial networks (GANs) have been successfully applied to many fields like image translation (Jing et al., 2019; Isola et al., 2017; Liu & Tuzel, 2016; Shrivastava et al., 2017; Zhu et al., 2017) and image generation (Miyato et al., 2018; Radford et al., 2016; Gulrajani et al., 2017; Arjovsky et al., 2017) . However, they are often heavily parameterized and often require intensive calculation at the training and inference phase. Network compressing techniques (LeCun et al., 1990; Wang et al., 2019; 2020b; Li et al., 2020) can be of help at inference by reducing the number of parameters or usage of memory; nonetheless, they can not save computational burden at no cost. Although they strive to maintain the performance after compressing the model, a non-negligible drop in generative capacity is usually observed. A question is raised: Is there any way to compress a GAN model while preserving or even improving its performance? The lottery ticket hypothesis (LTH) (Frankle & Carbin, 2019) provides positive answers with matching subnetworks (Chen et al., 2020b) . It states that there exist matching subnetworks in dense models that can be trained to reach a comparable test accuracy to the full model within similar training iterations. The hypothesis has successfully shown its success in various fields (Yu et al., 2020; Renda et al., 2020; Chen et al., 2020b) , and its property has been studied widely (Malach et al., 2020; Pensia et al., 2020; Elesedy et al., 2020) . However, it is never introduced to GANs, and therefore the presence of matching subnetworks in generative adversarial networks still remains mysterious. To address this gap in the literature, we investigate the lottery ticket hypothesis in GANs. One most critical challenge of extending LTH in GANs emerges: how to deal with the discriminator while compressing the generator, including (i) whether prunes the discriminator simultaneously and (ii) what initialization should be adopted by discriminators during the re-training? Previous GAN compression methods (Shu et al., 2019; Wang et al., 2019; Li et al., 2020; Wang et al., 2020b) prune the generator model only since they aim at reducing parameters in the inference stage. The effect of pruning the discriminator has never been studied by these works, which is unnecessary for them but possibly essential in finding matching subnetworks. It is because that finding matching subnetworks involves re-training the whole GAN network, in which an imbalance in generative and discriminative power could result in degraded training results. For the same reason, the disequilibrium between initialization used in generators and discriminators incurs severe training instability and unsatisfactory results. Another attractive property of LTH is the powerful transferability of located matching subnetworks. Although it has been well studied in discriminative models (Mehta, 2019; Morcos et al., 2019; Chen et al., 2020b) , an in-depth understanding of transfer learning in GAN tickets is still missing. In this work, we not only show whether the sparse matching subnetworks in GANs can transfer across multiple datasets but also study what initialization benefits more to the transferability. To convert parameter efficiency of LTH into the advantage of computational saving, we also utilize channel pruning (He et al., 2017) to find the structural matching subnetworks of GANs, which enjoys the bonus of accelerated training and inference. Our contributions can be summarized in the following four aspects: • Using unstructured magnitude pruning, we identify matching subnetworks at 74% sparsity in SNGAN (Miyato et al., 2018) and 67% in CycleGAN (Zhu et al., 2017) . The matching subnetworks in GANs exist no matter whether pruning discriminators, while the initialization weights used in the discriminator are crucial. • We show that the matching subnetworks found by iterative magnitude pruning outperform subnetworks extracted by randomly pruning and random initialization in terms of extreme sparsity and performance. To fully exploit the trained discriminator, we using the dense discriminator as a distillation source and further improve the quality of winning tickets. • We demonstrate that the found subnetworks in GANs transfer well across diverse generative tasks. • The matching subnetworks found by channel pruning surpass previous state-of-the-art GAN compression methods (i.e., GAN Slimming (Wang et al., 2020b) ) in both efficiency and performance.

2. RELATED WORK

GAN Compression Generative adversarial networks (GANs) have succeeded in computer vision fields, for example, image generation and translation. One significant drawback of the generative models is the high computational cost of the models' complex structure. A wide range of neural network compression techniques has been applied to generative models to address this problem. There are several categories of compression techniques, including pruning (removing some parameters), quantization (reducing the bit width), and distillation. Shu et al. ( 2019) proposed a channel pruning method for CycleGAN by using a co-evolution algorithm. Wang et al. ( 2019) proposed a quantization method for GANs based on the EM algorithm. Li et al. ( 2020) used a distillation method to transfer knowledge of the dense to the compressed model. Recently Wang et al. (2020b) proposed a GAN compression framework, GAN slimming, that integrated the above three mainstream compression techniques into a unified form. Previous works on GAN pruning usually aim at finding a sparse structure of the trained generator model for faster inference speed, while we are focusing on finding trainable structures of GANs following the lottery ticket hypothesis. Moreover, in existing GAN compression methods, only the generator is pruned, which could undermine the performance of re-training since the left-out discriminator may have a stronger computational ability than the pruned generator and therefore cause a degraded result due to the imparity of these two models. The Lottery Ticket Hypothesis The lottery ticket hypothesis (LTH) (Frankle & Carbin, 2019) claims the existence of sparse, separate trainable sub-networks in a dense network. These subnetworks are capable of reaching comparable or even better performance than full dense model, which has been evidenced in various fields, such as image classification (Frankle & Carbin, 2019; Liu et al., 2019; Wang et al., 2020a; Evci et al., 2019; Frankle et al., 2020; Savarese et al., 2020; Yin et al., 2020; You et al., 2020; Ma et al., 2021; Chen et al., 2020a) 



, natural language processing(Gale et al.,  2019; Chen et al., 2020b), reinforcement learning (Yu et al., 2020), lifelong learning (Chen et al., 2021b), graph neural networks (Chen et al., 2021a), and adversarial robustness(Cosentino et al.,  2019). Most works of LTH use unstructured weight magnitude pruning(Han et al., 2016; Frankle &

