INTERPRETING CLASS CONDITIONAL GANS WITH CHANNEL AWARENESS

Abstract

Understanding the mechanism of generative adversarial networks (GANs) helps us better use GANs for downstream applications. Existing efforts mainly target interpreting unconditional models, leaving it less explored how a conditional GAN learns to render images regarding various categories. This work fills in this gap by investigating how a class conditional generator unifies the synthesis of multiple classes. For this purpose, we dive into the widely used class-conditional batch normalization (CCBN), and observe that each feature channel is activated at varying degrees given different categorical embeddings. To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis. Extensive evaluations and analyses on the BigGAN model pre-trained on ImageNet reveal that only a subset of channels is primarily responsible for the generation of a particular category, similar categories (e.g., cat and dog) usually get related to some same channels, and some channels turn out to share information across all classes. For good measure, our algorithm enables several novel applications with conditional GANs. Concretely, we achieve (1) versatile image editing via simply altering a single channel and manage to (2) harmoniously hybridize two different classes. We further verify that the proposed channel awareness shows promising potential in (3) segmenting the synthesized image and (4) evaluating the category-wise synthesis performance. Code will be made publicly available.

1. INTRODUCTION

The past few years have witnessed the rapid advancement of generative adversarial networks (GANs) in image synthesis (Karras et al., 2021; Brock et al., 2019) . Despite the wide range of applications powered by GANs, like image-to-image translation (Isola et al., 2017 ), superresolution (Chan et al., 2021; Menon et al., 2020) , and image editing (Ling et al., 2021) , it typically requires learning a separate model for a new task, which can be time and resources consuming. Some recent studies have confirmed that a well-trained GAN model naturally supports various downstream applications, benefiting from the rich knowledge learned in the training process (Bau et al., 2019; Shen et al., 2020) . Therefore, to make sufficient use of a GAN, it becomes crucial to explore and further exploit its internal knowledge. Many attempts have been made to understand the generation mechanism of GANs. It is revealed that, to produce a fair synthesis, the generator is required to render multi-level semantics, such as the overall attributes (e.g., the gender of a face image) (Shen et al., 2020) , the objects inside (e.g., the bed in a bedroom image) (Bau et al., 2019; Yang et al., 2020) , the part-whole organization (e.g., the segmentation of the synthesis) (Zhang et al., 2021) , etc. However, existing efforts mainly focus on interpreting unconditional GANs, leaving conditional generation as a black box. Compared with unconditional models, a class conditional model is more informative and efficient in that it unifies the synthesis of multiple categories, like animals, vehicles, and scenes (Brock et al., 2019) . Figuring out how it manages the class information owns much great potential yet rarely explored. To fill in this gap, we take a close look at the popular class-conditional batch normalization (CCBN) (Brock et al., 2019) , which is one of the core modules distinguishing conditional generators from unconditional ones. Concretely, CCBN learns category-specific parameters to scale and shift the input features, such that the output features developed with different class embeddings can be easily told apart from each other, eventually resulting in the synthesis of various categories. We notice from such a process that, under the perspective of the ReLU activation (Nair & Hinton, 2010) following CCBN, different feature channels present varying behaviors given different embeddings. To quantify the aforementioned channel effect, we propose channel awareness that characterizes how a single channel contributes to the final synthesis. Through in-depth analyses on the BigGAN (Brock et al., 2019) model pre-trained on ImageNet (Deng et al., 2009) , we have the following key findings, which are also illustrated in Fig. 1b . First, only a portion of channels are active in rendering images for a particular class while the remaining channels barely affect the generation. Second, more similar categories tend to share more relevant channels. For instance, channels regarding dog synthesis intersect with those of cats but disjoint from those of buses. Third, some channels highly response to the latent code instead of the class embedding and hence appear to deliver knowledge to all classes. Beyond model interpretation, our proposed channel awareness facilitates a range of novel applications with class conditional GANs, as shown in Fig. 1c . First, after identifying the relevant channels through awareness ranking, we realize versatile image editing by simply altering a single feature channel (Sec. 5.1). Second, through mixing the channels that are related to two classes respectively, we achieve harmonious category hybridization (Sec. 5.2). Third, we verify that intermediate feature maps from the generator, after weighted by our channel awareness, can be convincingly used for fine-grained semantic segmentation (Sec. 5.3). Fourth, we empirically demonstrate the potential of our channel awareness in evaluating the category-wise synthesis performance (Sec. 5.4).

2. RELATED WORK

Among various types of generative models, such as variational auto-encoder (VAEs) (Kingma & Welling, 2013; Razavi et al., 2019) , flow-based model (Kingma & Dhariwal, 2018 ), diffusion model (Ho et al., 2020; Dhariwal & Nichol, 2021 ), etc., GAN (Goodfellow et al., 2014) has received wide attention due to its impressive performance on both unconditional synthesis (Karras et al., 2019; 2020; 2021) and conditional synthesis (Zhang et al., 2019; Brock et al., 2019; Sauer et al., 2022) . Early studies on interpreting GANs (Bau et al., 2019; Shen et al., 2020) suggest that, a well-learned GAN generator has encoded rich knowledge that can be promising applied to various



Figure 1:Novel applications enabled by interpreting class conditional GANs. Given a conditional generator in (a), we propose channel awareness to quantify the contribution of each feature channel to the output image, as shown in (b), which reveals how the categorical information is handled by different channels. Red, green, and blue channels are primarily responsible for the synthesis of a particular category, while yellow ones are shared by all classes. (c) Such an interpretation facilitates a range of applications, including single-channel image editing, category hybridization, fine-grained semantic segmentation, and category-wise synthesis performance evaluation. (Zoom in for better view.)

