RANDOM NETWORK DISTILLATION AS A DIVERSITY METRIC FOR BOTH IMAGE AND TEXT GENERATION Anonymous

Abstract

Generative models are increasingly able to produce remarkably high quality images and text. The community has developed numerous evaluation metrics for comparing generative models. However, these metrics do not effectively quantify data diversity. We develop a new diversity metric that can readily be applied to data, both synthetic and natural, of any type. Our method employs random network distillation, a technique introduced in reinforcement learning. We validate and deploy this metric on both images and text. We further explore diversity in few-shot image generation, a setting which was previously difficult to evaluate.

1. INTRODUCTION

State-of-the-art generative adversarial networks (GANs) are able to synthesize such high quality images that humans may have a difficult time distinguishing them from natural images (Brock et al., 2018; Karras et al., 2019) . Not only can GANs produce pretty pictures, but they are also useful for applied tasks from projecting noisy images onto the natural image manifold to generating training data (Samangouei et al., 2018; Sixt et al., 2018; Bowles et al., 2018) . Similarly, massive transformer models are capable of performing question-answering and translation (Brown et al., 2020) . In order for GANs and text generators to be valuable, they must generate diverse data rather than memorizing a small number of samples. Diverse data should contain a wide variety of semantic content, and its distribution should not concentrate around a small subset of modes from the true image distribution. A number of metrics have emerged for evaluating GAN-generated images and synthetic text. However, these metrics do not effectively quantify data diversity, and they work on a small number of specific benchmark tasks (Salimans et al., 2016; Heusel et al., 2017) . Diversity metrics for synthetic text use only rudimentary tools and only measure similarity of phrases and vocabulary rather than semantic meaning (Zhu et al., 2018) . Our novel contributions can be summarized as follows: • We design a framework (RND) for comparing diversity of datasets using random network distillation. Our framework can be applied to any type of data, from images to text and beyond. RND does not suffer from common problems that have plagued evaluation of generative models, such as vulnerability to memorization, and it can even be used to evaluate the diversity of natural data (not synthetic) since it does not require a reference dataset. • We validate the effectiveness of our method in a controlled setting by synthetically manipulating the diversity of GAN-generated images. We use the same truncation strategy employed by BigGAN to increase FID scores, and we confirm that this strategy indeed decreases diversity. This observation calls into question the usefulness of such popular metrics as FID scores for measuring diversity. • We benchmark data, both synthetic and natural, using our random distillation method. In addition to evaluating the most popular ImageNet-trained generative models and popular language models, we evaluate GANs in the data scarce regime, i.e. single-image GANs, which were previously difficult to evaluate. We also evaluate the diversity of natural data.

