GROUP EQUIVARIANT GENERATIVE ADVERSARIAL NETWORKS

Abstract

Recent improvements in generative adversarial visual synthesis incorporate real and fake image transformation in a self-supervised setting, leading to increased stability and perceptual fidelity. However, these approaches typically involve image augmentations via additional regularizers in the GAN objective and thus spend valuable network capacity towards approximating transformation equivariance instead of their desired task. In this work, we explicitly incorporate inductive symmetry priors into the network architectures via group-equivariant convolutional networks. Group-convolutions have higher expressive power with fewer samples and lead to better gradient feedback between generator and discriminator. We show that group-equivariance integrates seamlessly with recent techniques for GAN training across regularizers, architectures, and loss functions. We demonstrate the utility of our methods for conditional synthesis by improving generation in the limited data regime across symmetric imaging datasets and even find benefits for natural images with preferred orientation.

1. INTRODUCTION

Generative visual modeling is an area of active research, time and again finding diverse and creative applications. A prevailing approach is the generative adversarial network (GAN), wherein density estimation is implicitly approximated by a min-max game between two neural networks (Goodfellow et al., 2014) . Recent GANs are capable of high-quality natural image synthesis and scale dramatically with increases in data and compute (Brock et al., 2018) . However, GANs are prone to instability due to the difficulty of achieving a local equilibrium between the two networks. Frequent failures include one or both networks diverging or the generator only capturing a few modes of the empirical distribution. Several proposed remedies include modifying training objectives (Arjovsky et al., 2017; Jolicoeur-Martineau, 2018) , hierarchical methods (Karras et al., 2017) , instance selection (Sinha et al., 2019; 2020 ), latent optimization (Wu et al., 2019)) , and strongly regularizing one or both networks (Gulrajani et al., 2017; Miyato et al., 2018; Dieng et al., 2019) , among others. In practice, one or all of the above techniques are ultimately adapted to specific use cases. Further, limits on data quantity empirically exacerbate training stability issues more often due to discriminator overfitting. Recent work on GANs for small sample sizes can be roughly divided into transfer learning approaches (Wang et al., 2018; Noguchi & Harada, 2019; Mo et al., 2020; Zhao et al., 2020a) or methods which transform/augment the available training data and provide the discriminator with auxiliary tasks. For example, Chen et al. ( 2019) propose a multi-task discriminator which additionally predicts the degree by which an input image has been rotated, whereas Zhang et al. (2020); Zhao et al. (2020c) incorporate consistency regularization where the discriminator is penalized towards similar activations for transformed/augmented real and fake images. However, with consistency regularization and augmentation, network capacity is spent learning equivariance to transformation as opposed to the desired task and equivariance is not guaranteed. In this work, we consider the problem of training tabula rasa on limited data which possess global and even local symmetries. We begin by noting that GANs ubiquitously use convolutional layers

