INFLUENCE ESTIMATION FOR GENERATIVE ADVER-SARIAL NETWORKS

Abstract

Identifying harmful instances, whose absence in a training dataset improves model performance, is important for building better machine learning models. Although previous studies have succeeded in estimating harmful instances under supervised settings, they cannot be trivially extended to generative adversarial networks (GANs). This is because previous approaches require that (i) the absence of a training instance directly affects the loss value and that (ii) the change in the loss directly measures the harmfulness of the instance for the performance of a model. In GAN training, however, neither of the requirements is satisfied. This is because, (i) the generator's loss is not directly affected by the training instances as they are not part of the generator's training steps, and (ii) the values of GAN's losses normally do not capture the generative performance of a model. To this end, (i) we propose an influence estimation method that uses the Jacobian of the gradient of the generator's loss with respect to the discriminator's parameters (and vice versa) to trace how the absence of an instance in the discriminator's training affects the generator's parameters, and (ii) we propose a novel evaluation scheme, in which we assess harmfulness of each training instance on the basis of how GAN evaluation metric (e.g., inception score) is expected to change due to the removal of the instance. We experimentally verified that our influence estimation method correctly inferred the changes in GAN evaluation metrics. We also demonstrated that the removal of the identified harmful instances effectively improved the model's generative performance with respect to various GAN evaluation metrics.

1. INTRODUCTION

Generative adversarial networks (GANs) proposed by Goodfellow et al. (2014) are a powerful subclass of generative model, which is successfully applied to a number of image generation tasks (Antoniou et al., 2017; Ledig et al., 2017; Wu et al., 2016) . The expansion of the applications of GANs makes improvements in the generative performance of models increasingly crucial. An effective approach for improving machine learning models is to identify training instances that harm the model performance. Traditionally, statisticians manually screen a dataset for harmful instances, which misguide a model into producing biased predictions. Recent influence estimation methods (Khanna et al., 2019; Hara et al., 2019) automated the screening of datasets for deep learning settings, in which the sizes of both datasets and data dimensions are too large for users to manually determine the harmful instances. Influence estimation measures the effect of removing an individual training instance on a model's prediction without the computationally prohibitive cost of model retraining. The recent studies identified harmful instances by estimating how the loss value changes if each training instance is removed from the dataset. Although previous studies have succeeded in identifying the harmful instances in supervised settings, the extension of their approaches to GAN is non-trivial. Previous approaches require that (i) the existence or absence of a training instance directly affects a loss value, and that (ii) the decrease in the loss value represents the harmfulness of the removed training instance. In GAN training, however, neither of the requirements is satisfied. (i) As training instances are only fed into the discriminator, they only indirectly affect the generator's loss, and (ii) the changes in the losses of GAN

