MITIGATING MODE COLLAPSE BY SIDESTEPPING CATASTROPHIC FORGETTING

Abstract

Generative Adversarial Networks (GANs) are a class of generative models used for various applications, but they have been known to suffer from the mode collapse problem, in which some modes of the target distribution are ignored by the generator. Investigative study using a new data generation procedure indicates that the mode collapse of the generator is driven by the discriminator's inability to maintain classification accuracy on previously seen samples, a phenomenon called Catastrophic Forgetting in continual learning. Motivated by this observation, we introduce a novel training procedure that dynamically spawns additional discriminators to remember previous modes of generation. On several datasets, we show that our training scheme can be plugged-in to existing GAN frameworks to mitigate mode collapse and improve standard metrics for GAN evaluation.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) are an extremely popular class of generative models that is not only used for text and image generation, but also in various fields of science and engineering, including biomedical imaging (Yi et al., 2019; Nie et al., 2018; Wolterink et al., 2017) , autonomous driving (Hoffman et al., 2018; Zhang et al., 2018) , and robotics (Rao et al., 2020; Bousmalis et al., 2018) . However, GANs are widely known to be prone to mode collapse, which refers to a situation where the generator only samples a few modes of the real data, failing to faithfully capture other more complex or less frequent categories. While the mode collapse problem is often overlooked in text and image generation tasks, and even traded off for higher realism of individual samples (Karras et al., 2019; Brock et al., 2019) , dropping infrequent classes may cause serious problems in real-world problems, in which the infrequent classes represent important anomalies. For example, a collapsed GAN can produce racial/gender biased images (Menon et al., 2020) . Moreover, mode collapse causes instability in optimization, which can damage not only the diversity but also the realism of individual samples of the final results. As an example, we visualized the training progression of the vanilla GAN (Goodfellow et al., 2014) for a simple bimodal distribution in the top row of Figure 1 . At collapse, the discriminator conveniently assigns high realism to the region unoccupied by the generator, regardless of the true density of the real data. This produces a strong gradient for the generator to move its samples toward the dropped mode, swaying mode collapse to the opposite side. In particular, the discriminator loses its ability to detect fake samples it was previously able to, such as point X . The oscillation continues without convergence. From this observation, we hypothesize that the mode collapse problem in GAN training is closely related to Catastrophic Forgetting (McCloskey & Cohen, 1989; McClelland et al., 1995; Ratcliff, 1990) in continual learning. That is, since the distribution of the generated samples is not stationary, the discriminator forgets to classify the previously generated samples as fake, hindering convergence of the GAN minimax game. A promising line of works (Zhang et al., 2019b; Rajasegaran et al., 2019; Rusu et al., 2016; Fernando et al., 2017) tackle the problem in the supervised learning setting by instantiating multiple predictors, each of which takes charge in a particular subset of the whole distribution. Likewise, we also tackle the problem of mode collapse in GAN by tracking the severity of Catastrophic Forgetting by storing a few exemplar data during training, and dynamically spawning an additional discriminator if forgetting is detected, as shown in Figure 1 . The key idea is that the added discriminator is left intact unless the generator recovers from mode dropping of that sample, essentially sidestepping catastrophic forgetting. While the mode collapse problem has been tackled by many previous works, as discussed in Section 2, we show that our approach based on Catastrophic Forgetting can be added to any existing GAN frameworks, and is the most effective in preventing mode collapse. Furthermore, the improved stability of training boosts the standard metrics on popular GAN frameworks. To summarize, our contributions are: • We propose a novel GAN framework, named Dynamic Multi Adversarial Training (DMAT), that prevents Catastrophic Forgetting in GANs by dynamically spawning additional discriminators during training. • We propose a computationally efficient synthetic data generation procedure for studying mode collapse in GANs that allows visualizing high dimensional data using normalizing flows. We show that mode collapse occurs even in the recent robust GAN formulations. • Our method can be plugged into any state-of-the-art GAN frameworks and still improve the quality and coverage of the generated samples.

2. RELATED WORKS

Previous works have focused on independently solving either catastrophic forgetting in supervised learning or mode collapse during GAN training. Among the efforts addressing mode collapse, a few prior works have proposed multi-adversarial solutions for mitigating mode collapse, similar to our work. In this section we review these works in detail and discuss our commonalities and differences.

2.1. MITIGATING MODE COLLAPSE IN GANS

Along with advancement in the perceptual quality of images generated by GAN (Miyato et al., 2018; Karras et al., 2019; Brock et al., 2018; Karras et al., 2020) , a large number of papers (Durugkar et al., 2016; Metz et al., 2016; Arjovsky et al., 2017; Srivastava et al., 2017; Nguyen et al., 2017; Lin et al., 2018; Mescheder et al., 2018; Karras et al., 2019) identify the problem of mode collapse in GANs and aim to mitigate it. However, many of them do not attempt to directly address mode collapse, as it was seen as a secondary symptom that would be naturally solved as the stability of GAN optimization progresses (Arjovsky et al., 2017; Mescheder et al., 2018; Bau et al., 2019) . While the magnitude of mode collapse is certainly mitigated with more stable optimization, we show that it is still not a solved problem. To explicitly address mode collapse, Unrolled GAN (Metz et al., 2016) proposes an unrolled optimization of the discriminator to optimally match the generator objective, thus preventing mode collapse. VEEGAN (Srivastava et al., 2017) utilizes the reconstruction loss on the latent space. PacGAN (Lin et al., 2018) feeds multiple samples of the same class to the discriminator when making the decisions about real/fake. In contrast, our approach differs in that our method can be plugged into existing state-of-the-art GAN frameworks to yield additional performance boost.



Figure 1: Visualizing training trajectories: We visualize the distribution of the real (green dots) and fake (blue dots) over the course of the vanilla GAN (top row) and our method (the second row and below). The background color indicates the prediction heatmap of the discriminator with blue being fake and warm yellow being real. Once the vanilla GAN falls into mode collapse (top row), it ends up oscillating between the two modes without convergence. Moreover, the discriminator's prediction at point X oscillates, indicating catastrophic forgetting in the discriminator. With our DMAT procedure, a new discriminator is dynamically spawned during training. The additional discriminator effectively learns the forgotten mode, guiding the GAN optimization toward convergence.

