GROUP EQUIVARIANT GENERATIVE ADVERSARIAL NETWORKS

Abstract

Recent improvements in generative adversarial visual synthesis incorporate real and fake image transformation in a self-supervised setting, leading to increased stability and perceptual fidelity. However, these approaches typically involve image augmentations via additional regularizers in the GAN objective and thus spend valuable network capacity towards approximating transformation equivariance instead of their desired task. In this work, we explicitly incorporate inductive symmetry priors into the network architectures via group-equivariant convolutional networks. Group-convolutions have higher expressive power with fewer samples and lead to better gradient feedback between generator and discriminator. We show that group-equivariance integrates seamlessly with recent techniques for GAN training across regularizers, architectures, and loss functions. We demonstrate the utility of our methods for conditional synthesis by improving generation in the limited data regime across symmetric imaging datasets and even find benefits for natural images with preferred orientation.

1. INTRODUCTION

Generative visual modeling is an area of active research, time and again finding diverse and creative applications. A prevailing approach is the generative adversarial network (GAN), wherein density estimation is implicitly approximated by a min-max game between two neural networks (Goodfellow et al., 2014) . Recent GANs are capable of high-quality natural image synthesis and scale dramatically with increases in data and compute (Brock et al., 2018) . However, GANs are prone to instability due to the difficulty of achieving a local equilibrium between the two networks. Frequent failures include one or both networks diverging or the generator only capturing a few modes of the empirical distribution. Several proposed remedies include modifying training objectives (Arjovsky et al., 2017; Jolicoeur-Martineau, 2018 ), hierarchical methods (Karras et al., 2017) , instance selection (Sinha et al., 2019; 2020 ), latent optimization (Wu et al., 2019)) , and strongly regularizing one or both networks (Gulrajani et al., 2017; Miyato et al., 2018; Dieng et al., 2019) , among others. In practice, one or all of the above techniques are ultimately adapted to specific use cases. Further, limits on data quantity empirically exacerbate training stability issues more often due to discriminator overfitting. Recent work on GANs for small sample sizes can be roughly divided into transfer learning approaches (Wang et al., 2018; Noguchi & Harada, 2019; Mo et al., 2020; Zhao et al., 2020a) or methods which transform/augment the available training data and provide the discriminator with auxiliary tasks. For example, Chen et al. ( 2019) propose a multi-task discriminator which additionally predicts the degree by which an input image has been rotated, whereas Zhang et al. (2020); Zhao et al. (2020c) incorporate consistency regularization where the discriminator is penalized towards similar activations for transformed/augmented real and fake images. However, with consistency regularization and augmentation, network capacity is spent learning equivariance to transformation as opposed to the desired task and equivariance is not guaranteed. In this work, we consider the problem of training tabula rasa on limited data which possess global and even local symmetries. We begin by noting that GANs ubiquitously use convolutional layers Equivariant features may also be constructed via scattering networks consisting of non-trainable Wavelet filters, enabling equivariance to diverse symmetries (Mallat, 2012; Bruna & Mallat, 2013; Sifre & Mallat, 2013) . Generative scattering networks include Angles & Mallat (2018) where a standard convolutional decoder is optimized to reconstruct images from an embedding generated by a fixed scattering network and Oyallon et al. ( 2019) who show preliminary results using a standard convolutional GAN to generate scattering coefficients. We note that while both approaches are promising, they currently yield suboptimal synthesis results not comparable to modern GANs. Capsule networks (Hinton et al., 2011; Sabour et al., 2017) are also equivariant and emerging work has shown that using a capsule network for the GAN discriminator (Jaiswal et al., 2019; Upadhyay & Schrater, 2018) improves synthesis on toy datasets. However, capsule GANs and generative scattering approaches require complex training strategies, restrictive architectural choices not compatible with recent insights in GAN training, and have not yet been shown to scale to real-world datasets. In this work, we improve the generative modeling of images with transformation invariant labels by using an inductive bias of symmetry. We replace all convolutions with group-convolutions thereby admitting a higher degree of weight sharing which enables increased visual fidelity, especially with limited-sample datasets. To our knowledge, we are the first to use group-equivariant layers in the GAN context and to use symmetry-driven considerations in both generator and discriminator architectures. Our contributions are as follows, 1. We introduce symmetry priors via group-equivariance to generative adversarial networks. 2. We show that recent insights in improving GAN training are fully compatible with groupequivariance with careful reformulations. 3. We improve class-conditional image synthesis across a diversity of datasets, architectures, loss functions, and regularizations. These improvements are consistent for both symmetric images and even natural images with preferred orientation.

2.1. PRELIMINARIES

Groups and group-convolutions. A group is a set with an endowed binary function satisfying the properties of closure, associativity, identity, and invertibility. A two-dimensional symmetry group is



Figure 1: Several image modalities have no preferred orientation for tasks such as classification. We improve their generative modeling by utilizing image symmetries within a GAN framework.

