AUGMENTATION -INTERPOLATIVE AUTOENCODERS FOR UNSUPERVISED FEW-SHOT IMAGE GENERATION Anonymous

Abstract

We aim to build image generation models that generalize to new domains from few examples. To this end, we first investigate the generalization properties of classic image generators, and discover that autoencoders generalize extremely well to new domains, even when trained on highly constrained data. We leverage this insight to produce a robust, unsupervised few-shot image generation algorithm, and introduce a novel training procedure based on recovering an image from data augmentations. Our Augmentation-Interpolative AutoEncoders synthesize realistic images of novel objects from only a few reference images, and outperform both prior interpolative models and supervised few-shot image generators. Our procedure is simple and lightweight, generalizes broadly, and requires no category labels or other supervision during training.

1. INTRODUCTION

Modern generative models can synthesize high-quality (Karras et al., 2019; Razavi et al., 2019; Zhang et al., 2018a) , diverse (Ghosh et al., 2018; Mao et al., 2019; Razavi et al., 2019) , and highresolution (Brock et al., 2018; Karras et al., 2017; 2019) images of any class, but only given a large training dataset for these classes (Creswell et al., 2017) . This requirement of a large dataset is impractical in many scenarios. For example, an artist might want to use image generation to help create concept art of futuristic vehicles. Smartphone users may wish to animate a collection of selfies, or researchers training an image classifier might wish to generate augmented data for rare classes. These and other applications will require generative models capable of synthesizing images from a large, ever-growing set of object classes. We cannot rely on having hundreds of labeled images for all of them. Furthermore, most of them will likely be unknown at the time of training. We therefore need generative models that can train on one set of image classes, and then generalize to a new class using only a small quantity of new images: few-shot image generation. Unfortunately, we find that the latest and greatest generative models cannot even represent novel classes in their latent space, let alone generate them on demand (Figure 1 ). Perhaps because of this generalization challenge, recent attempts at few-shot image generation rely on undesirable assumptions and compromises. They need impractically large labeled datasets of hundreds of classes (Edwards & Storkey, 2016), involve substantial computation at test time (Clouâtre & Demers, 2019), or are highly domain-specific, generalizing only across very similar classes (Jitkrittum et al., 2019) . In this paper, we introduce a strong, efficient, unsupervised baseline for few-shot image generation that avoids all the above compromises. We leverage the finding that although the latent spaces of powerful generative models, such as VAEs and GANs, do not generalize to new classes, the representations learned by autoencoders (AEs) generalize extremely well. The AEs can then be converted into generative models by training them to interpolate between seed images (Sainburg et al., 2018; Berthelot et al., 2018; Beckham et al., 2019) . These Interpolative AutoEncoders (IntAEs) would seem a natural fit for few-shot image generation. Unfortunately, we also find that although IntAEs can reproduce images from novel classes, the ability to interpolate between them breaks down upon leaving the training domain. To remedy this, we introduce a new training method based on data augmentation, which produces smooth, meaningful interpolations in novel domains. We demonstrate on three different settings (handwritten characters, faces and general objects) that our Augmentation-Interpolative Autoencoder (AugIntAE) achieves simple, robust, highly general, and completely unsupervised few-shot image generation.

