SELF-CONSISTENT LEARNING: COOPERATION BE-TWEEN GENERATORS AND DISCRIMINATORS

Abstract

Using generated data to improve the performance of downstream discriminative models has recently gained popularity due to the great development of pre-trained language models. In most previous studies, generative models and discriminative models are trained separately and thus could not adapt to any changes in each other. As a result, the generated samples can easily deviate from the real data distribution, while the improvement of the discriminative model quickly reaches saturation. Generative adversarial networks (GANs) train generative models via an adversarial process with discriminative models to achieve joint training. However, the training of standard GANs is notoriously unstable and often falls short of convergence. In this paper, to address these issues, we propose a self-consistent learning framework, in which a discriminator and a generator are cooperatively trained in a closed-loop form. The discriminator and the generator enhance each other during multiple rounds of alternating training until a scoring consensus is reached. This framework proves to be easy to train and free from instabilities such as mode collapse and non-convergence. Extensive experiments on sentence semantic matching demonstrate the effectiveness of the proposed framework: the discriminator achieves 10+ AP of improvement on the zero-shot setting and new state-of-the-art performance on the full-data setting.

1. INTRODUCTION

The advance of Pre-trained Language Models (PLMs) (Brown et al., 2020; Chowdhery et al., 2022) has substantially improved the performance of deep neural networks across a variety of Natural Language Processing (NLP) tasks. Various language models, based on the Transformer (Vaswani et al., 2017) architecture, have been proposed, leading to state-of-the-art (SOTA) performance on the fundamental discrimination tasks. These models are first trained with self-supervised training objectives (e.g., predicting masked tokens according to surrounding tokens) on massive unlabeled text data, then fine-tuned on annotated data to adapt to downstream tasks of interest. However, annotated data is usually limited to a wide range of downstream tasks, which results in overfitting and a lack of generalization to unseen data. One straightforward way to deal with this data scarcity problem is data augmentation (Xie et al., 2020) , and incorporating generative models to perform data augmentation has been widely adopted recently (Carlini et al., 2021; Gangal et al., 2022) . Despite its popularity, the generated text can easily deviate from the real data distribution without exploiting any of the signals passed back from the discrimination task. In previous studies, generative data augmentation and discrimination have been well studied as separate problems, but it is less clear how these two can be leveraged in one framework and how their performances can be improved simultaneously. Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Gulrajani et al., 2017) are good attempts to couple generative and discriminative models in an adversarial manner, where a twoplayer minimax game between learners is carefully crafted. GANs have achieved tremendous success in domains such as image generation (Denton et al., 2015) , and related studies have also shown their effectiveness in semi-supervised learning (Salimans et al., 2016; Kumar et al., 2017) . However, GANs are notoriously difficult to train, most training objectives work well for only one model, either the discriminator or the generator, so rarely both learners can be optimal at the same time (Arjovsky & Bottou, 2017; Wiatrak et al., 2019) . This essentially arises from the adversarial nature of GANs, that during the process, optimizing one learner can easily destroy the learning ability of the other, making GANs fail to converge. Another limitation of simultaneously optimizing the generator and the discriminator comes from the discrete nature of text in NLP, as no gradient propagation can be done from discriminators to generators. One theoretically sound attempt is to use reinforcement learning (RL), but the sparsity and the high variance of the rewards in NLP make the training particularly unstable (Caccia et al., 2020) . To address these shortcomings, we novelly introduce a self-consistent learning framework based on one generator and one discriminator: the generator and the discriminator are alternately trained by way of cooperation instead of competition, and the samples are used as the medium to pass the feedback signal from the discriminator. Specifically, in each round of training, the samples generated by the generator are synthetically labeled by the discriminator, and then only part of them would be selected based on dynamic thresholds and used for the training of the discriminator and the generator in the next round. Several benefits can be discovered from this cooperative training process. First, a closed-loop form of cooperation can be established so that we can get the optimal generator and discriminator at the same time. Second, this framework helps improve the generation quality while ensuring the domain specificity of generator, which in turn contributes to training. Third, a steady stream of diverse synthetic samples can be added to the training in each round and lead to continuous improvement of the performance of all learners. Finally, we can start the training with only domainrelated corpus and obtain strong results, while these data can be easily sampled with little cost or supervision. Also, the performance on labeled datasets can be further boosted based on the SOTA level. As an example to demonstrate the effectiveness of our framework, we examine it on the task of sentence semantic matching. The experiments show that our method significantly improves over standalone state-of-the-art discriminative models on zero-shot and full-data settings. Our contributions are summarized as follows, • We propose a self-consistent learning framework that incorporates the generator and the discriminator, in which both achieve remarkable performance gains simultaneously. • We propose a dynamic selection mechanism such that cooperation between the generator and the discriminator drives the convergence to reach their scoring consensus. • Experimental results show that our proposed framework significantly outperforms the state-of-theart methods for the task of sentence semantic matching.

2. RELATED WORKS

To alleviate the lack of annotated data in supervised learning in NLP, semi-supervised learning (SSL) has been a popular line of research (Van Engelen & Hoos, 2020) . The sources of the unlabeled data required by SSL are either collected from the domains or generated by generative language models. Then NLU models can learn from the unlabeled data by pseudo-labeling (Arazo et al., 2020; Banitalebi-Dehkordi & Zhang, 2021) and consistent regularization (Jeong et al., 2019; Sohn et al., 2020) . However, collecting unlabeled data comes at a cost(though smaller than labeling data), and the total amount is limited. Even with generative models, there is no guarantee of the quality of the generated samples, because the model cannot tune the generating results based on the performance of the downstream tasks. In contrast, our method usually includes a continuously updated generative model, which dynamically adjusts its generation according to the performance of downstream tasks. In GANs, the generator is adversarially trained with the discriminator. Unlike conventional GANs in continuous domains, language GANs usually employ Gumbel-Softmax differentiation (Jang et al., 2017; Yin et al., 2020) , Reinforcement Learning (RL) (Yu et al., 2017; Wu et al., 2021) , or modified training objectives (Montahaei et al., 2021) to update the generator, to use the non-differential signals from the discriminator. However, language GANs are often criticized for underperforming Maximum likelihood estimation (MLE) and are very difficult to train, even the single optimality of either the generator or the discriminator cannot be guaranteed (Alvarez-Melis et al., 2022) . In comparison, our proposed framework allows us to cooperatively couple the generator and the discriminator, leading to continuous improvement for both learners.

