FEW-SHOT CROSS-DOMAIN IMAGE GENERATION VIA INFERENCE-TIME LATENT-CODE LEARNING

Abstract

In this work, our objective is to adapt a Deep generative model trained on a largescale source dataset to multiple target domains with scarce data. Specifically, we focus on adapting a pre-trained Generative Adversarial Network (GAN) to a target domain without re-training the generator. Our method draws the motivation from the fact that out-of-distribution samples can be 'embedded' onto the latent space of a pre-trained source-GAN. We propose to train a small latent-generation network during the inference stage, each time a batch of target samples is to be generated. These target latent codes are fed to the source-generator to obtain novel target samples. Despite using the same small set of target samples and the source generator, multiple independent training episodes of the latent-generation network results in the diversity of the generated target samples. Our method, albeit simple, can be used to generate data from multiple target distributions using a generator trained on a single source distribution. We demonstrate the efficacy of our surprisingly simple method in generating multiple target datasets with only a single source generator and a few target samples. The code of the proposed method is available at: https://github.com/arnabkmondal/GenDA 

1. INTRODUCTION

1.1 FEW SHOT IMAGE GENERATION Deep generative models learn to generate novel data points from an unknown underlying distribution. The family of auto-encoder based generative models (Kingma & Welling, 2014) use variational inference to maximize evidence lower bound (ELBO) on the data likelihood; adversarial generators such as GANs (Goodfellow et al., 2014) learn to sample by solving a min-max optimization game and the normalizing flow-based methods utilize tractable transformations between the latent and data distributions (Kobyzev et al., 2020) . All such models are shown to be successful in generating high-quality realistic data such as images (Karras et al., 2018; 2019; 2020b) . However, one of the caveats in deep generative models is that they require thousands of images for proper training, limiting the scope of what can be explored (Sushko et al., 2021) . This problem poses practical restrictions on the applications of deep generative models, as the number of training data is often limited to the order of hundreds or even tens at times; making it crucial to adapt generative models for few-shot settings. One natural way to accomplish the above objective is to use the 'priorknowledge' that is already there in a generative model built on a larger, but 'close' source dataset (Wang et al., 2018; 2020b) . Several ideas ranging from learning latent transformations (Wang et al., 2020b) to re-training generators on target data with regularizers such as Elastic-weight consolidation (Li et al., 2020) and cross-domain correspondence (Ojha et al., 2021) have been proposed (See section 2 for a detailed description). The basic principle in all these is to adapt the generator of Generative Adversarial Network (GAN) (Goodfellow et al., 2014) , trained on a large source dataset, on to the target dataset such that the re-trained generator imbibes the 'style' of the target while retaining the 'variability' of the source domain. In other words, the re-training is geared towards reducing the infamous problem of the catastrophic forgetting that bogs the realm of transfer learning (Mc-Closkey & Cohen, 1989) . While the aforementioned methods show good progress towards adapting a pretrained GAN, there are still shortcomings such as lack of diversity due to over-fitting. Further, these methods require de-novo re-training on every new target, which possibly leads to catastrophic forgetting. In this paper, we intend to tackle some of these issues by addressing the following ques-

