INFLUENCE ESTIMATION FOR GENERATIVE ADVER-SARIAL NETWORKS

Abstract

Identifying harmful instances, whose absence in a training dataset improves model performance, is important for building better machine learning models. Although previous studies have succeeded in estimating harmful instances under supervised settings, they cannot be trivially extended to generative adversarial networks (GANs). This is because previous approaches require that (i) the absence of a training instance directly affects the loss value and that (ii) the change in the loss directly measures the harmfulness of the instance for the performance of a model. In GAN training, however, neither of the requirements is satisfied. This is because, (i) the generator's loss is not directly affected by the training instances as they are not part of the generator's training steps, and (ii) the values of GAN's losses normally do not capture the generative performance of a model. To this end, (i) we propose an influence estimation method that uses the Jacobian of the gradient of the generator's loss with respect to the discriminator's parameters (and vice versa) to trace how the absence of an instance in the discriminator's training affects the generator's parameters, and (ii) we propose a novel evaluation scheme, in which we assess harmfulness of each training instance on the basis of how GAN evaluation metric (e.g., inception score) is expected to change due to the removal of the instance. We experimentally verified that our influence estimation method correctly inferred the changes in GAN evaluation metrics. We also demonstrated that the removal of the identified harmful instances effectively improved the model's generative performance with respect to various GAN evaluation metrics.

1. INTRODUCTION

Generative adversarial networks (GANs) proposed by Goodfellow et al. (2014) are a powerful subclass of generative model, which is successfully applied to a number of image generation tasks (Antoniou et al., 2017; Ledig et al., 2017; Wu et al., 2016) . The expansion of the applications of GANs makes improvements in the generative performance of models increasingly crucial. An effective approach for improving machine learning models is to identify training instances that harm the model performance. Traditionally, statisticians manually screen a dataset for harmful instances, which misguide a model into producing biased predictions. Recent influence estimation methods (Khanna et al., 2019; Hara et al., 2019) automated the screening of datasets for deep learning settings, in which the sizes of both datasets and data dimensions are too large for users to manually determine the harmful instances. Influence estimation measures the effect of removing an individual training instance on a model's prediction without the computationally prohibitive cost of model retraining. The recent studies identified harmful instances by estimating how the loss value changes if each training instance is removed from the dataset. Although previous studies have succeeded in identifying the harmful instances in supervised settings, the extension of their approaches to GAN is non-trivial. Previous approaches require that (i) the existence or absence of a training instance directly affects a loss value, and that (ii) the decrease in the loss value represents the harmfulness of the removed training instance. In GAN training, however, neither of the requirements is satisfied. (i) As training instances are only fed into the discriminator, they only indirectly affect the generator's loss, and (ii) the changes in the losses of GAN do not necessarily capture how the removed instances harm the generative performance. This is because the ability of the loss to evaluate the generator is highly dependent on the performance of the discriminator. To this end, (i) we propose an influence estimation method that uses the Jacobian of the gradient of the discriminator's loss with respect to the generator's parameters (and vice versa), which traces how the absence of an instance in the discriminator's training affects the generator's parameters. In addition, (ii) we propose a novel evaluation scheme to judge if an instance is harmful or not on the basis of influence on GAN evaluation metric, that is, how a GAN evaluation metric (e.g., inception score (Salimans et al., 2016) ) changes if a given training instance is removed from the dataset. We identify harmful instances by estimating the influence on GAN evaluation metric by leveraging our influence estimation method. We verified that the proposed influence estimation method correctly estimated the influence on GAN evaluation metrics across different settings of the dataset, model architecture, and GAN evaluation metrics. We also demonstrated that removing harmful instances, which were identified by the proposed method, effectively improved various GAN evaluation metrics. 1Our contributions are summarized as follows: • We propose an influence estimation method that uses the Jacobian of the gradient of the discriminator's loss with respect to the generator's parameters (and vice versa), which traces how the absence of an instance in the discriminator's training affects the generator's parameters. • We propose a novel evaluation scheme to judge if an instance is harmful or not on the basis of influence on GAN evaluation metrics rather than that on the loss value, and to leverage the proposed influence estimation method to identify harmful instances. • We experimentally verified that our influence estimation method correctly inferred the influence on GAN evaluation metrics. Further, we demonstrated that the removal of the harmful instances suggested by the proposed method effectively improved the generative performance with respect to various GAN evaluation metrics.

2. PRELIMINARIES

Notation For column vectors a, b ∈ R p , we denote the inner product by a, b = p i=1 a i b i . For a function f (a), we denote its gradient with respect to a by ∇ a f (a). We denote the identity matrix of size p by I p , the zero vector of length p by 0 p , and the ones vector of length p by 1 p . Generative Adversarial Networks (GAN) For simplicity, we consider an unconditional GAN that consists of the generator G : R dz → R dx and the discriminator D : R dx → R, where d z and d x are the number of dimensions of latent variable z ∼ p(z) and data point x ∼ p(x), respectively. The parameters of generator θ G ∈ R d G and discriminator θ D ∈ R d D are learned though the adversarial training; G tries to sample realistic data while D tries to identify whether the data is real or generated. (1)

Formulation of GAN Objectives

For the latter part of this paper, we use a coupled parameter vector θ := (θ G , θ D ) ∈ R d θ =d G +d D when we refer to the whole parameters of GAN. In this paper, we assume that L G and L D have the following forms 



Code is at https://github.com/hitachi-rd-cv/influence-estimation-for-gans This covers the common settings of GAN objectives: the non-zero-sum game proposed by Goodfellow et al. (2014), Wasserstein distance (Arjovsky et al., 2017), and the least squares loss (Mao et al., 2017).



For the generality, we adopt the formulation of Gidel et al. (2019) in which G and D try to minimize L G and L D , respectively, to obtain the following Nash equilibrium (θ * G , θ * D ): θ * G ∈ arg min θ G L G (θ G , θ * D ) and θ * D ∈ arg min θ D L D (θ * G , θ D ) .

2 : L G (θ) := E z∼p(z) [f G (z; θ)] , L D (θ) := E z∼p(z) f [z] D (z; θ) + E x∼p(x) f [x] D (x; θ) . (2)

