SELF-CONSISTENT LEARNING: COOPERATION BE-TWEEN GENERATORS AND DISCRIMINATORS

Abstract

Using generated data to improve the performance of downstream discriminative models has recently gained popularity due to the great development of pre-trained language models. In most previous studies, generative models and discriminative models are trained separately and thus could not adapt to any changes in each other. As a result, the generated samples can easily deviate from the real data distribution, while the improvement of the discriminative model quickly reaches saturation. Generative adversarial networks (GANs) train generative models via an adversarial process with discriminative models to achieve joint training. However, the training of standard GANs is notoriously unstable and often falls short of convergence. In this paper, to address these issues, we propose a self-consistent learning framework, in which a discriminator and a generator are cooperatively trained in a closed-loop form. The discriminator and the generator enhance each other during multiple rounds of alternating training until a scoring consensus is reached. This framework proves to be easy to train and free from instabilities such as mode collapse and non-convergence. Extensive experiments on sentence semantic matching demonstrate the effectiveness of the proposed framework: the discriminator achieves 10+ AP of improvement on the zero-shot setting and new state-of-the-art performance on the full-data setting.

1. INTRODUCTION

The advance of Pre-trained Language Models (PLMs) (Brown et al., 2020; Chowdhery et al., 2022) has substantially improved the performance of deep neural networks across a variety of Natural Language Processing (NLP) tasks. Various language models, based on the Transformer (Vaswani et al., 2017) architecture, have been proposed, leading to state-of-the-art (SOTA) performance on the fundamental discrimination tasks. These models are first trained with self-supervised training objectives (e.g., predicting masked tokens according to surrounding tokens) on massive unlabeled text data, then fine-tuned on annotated data to adapt to downstream tasks of interest. However, annotated data is usually limited to a wide range of downstream tasks, which results in overfitting and a lack of generalization to unseen data. One straightforward way to deal with this data scarcity problem is data augmentation (Xie et al., 2020) , and incorporating generative models to perform data augmentation has been widely adopted recently (Carlini et al., 2021; Gangal et al., 2022) . Despite its popularity, the generated text can easily deviate from the real data distribution without exploiting any of the signals passed back from the discrimination task. In previous studies, generative data augmentation and discrimination have been well studied as separate problems, but it is less clear how these two can be leveraged in one framework and how their performances can be improved simultaneously. Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Gulrajani et al., 2017) are good attempts to couple generative and discriminative models in an adversarial manner, where a twoplayer minimax game between learners is carefully crafted. GANs have achieved tremendous success in domains such as image generation (Denton et al., 2015) , and related studies have also shown their effectiveness in semi-supervised learning (Salimans et al., 2016; Kumar et al., 2017) . However, GANs are notoriously difficult to train, most training objectives work well for only one model, either the discriminator or the generator, so rarely both learners can be optimal at the same time (Arjovsky & Bottou, 2017; Wiatrak et al., 2019) . This essentially arises from the adversarial nature of GANs,

