FEW-SHOT TEXT CLASSIFICATION WITH DUAL CONTRASTIVE CONSISTENCY TRAINING

Abstract

In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification where only a few annotated examples are given for each class. Since using traditional cross-entropy loss to fine-tune language model under this scenario causes serious overfitting and leads to sub-optimal generalization of model, we adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data. Moreover, we propose a novel contrastive consistency to further boost model performance and refine sentence representation. After conducting extensive experiments on four datasets, we demonstrate that our model (FTCC) can outperform state-of-the-art methods and has better robustness.

1. INTRODUCTION

Text classification is a fundamental task in natural language processing with various applications such as question answering (Rajpurkar et al., 2016) , spam detection (Shahariar et al., 2019) and sentiment analysis (Chong et al., 2014) . With the advancement of deep learning, fine-tuning pre-trained language model (Devlin et al., 2019; Liu et al., 2019) achieves significant success. However, it still requires a large amount of labeled data to reach optimal generalization of model. Thus, researchers gradually focus on semi-supervised text classification where only a few annotated data is provided. The success of semi-supervised methods results from the usage of abundant unlabeled data: Unlabeled documents in training dataset provide natural consistency regularization by constraining the model predictions to be invariant to small noises in text input (Xie et al., 2019; Miyato et al., 2016; Chen et al., 2020a) . Despite mitigating the annotation burden, these methods are highly unstable in different runs and can still easily overfit on the very limited labeled data. Inspired by the success of supervised contrastive learning under few-shot settings (Gunel et al., 2021; Chen et al., 2022) , we hypothesize that the learned contrastive representation under this scenario can help us tackle aforementioned high variance issue and impose additional constraints to the model. Label information and feature structure can simultaneously be propagated from labeled examples to unlabeled ones. Thus, we devise a novel contrastive consistency schema to further boost model performance. To validate the effectiveness and robustness of FTCC, we conduct extensive experiments on four datasets. The result of our experiments confirms that FTCC can be leveraged to improve the performance of few-shot text classification. Based on this motivation, the contributions of this paper are as follows: • We integrate supervised contrastive learning objective into consistency-regularized semisupervised framework to perform text classification under few-shot scenario. • We devised a novel contrastive consistency schema to propagate feature structure from labeled data to unlabeled data dynamically. • We demonstrated our model's superiority over state-of-the-art semi-supervised methods and analyze the contribution of each component of FTCC through ablation study and also visualize the learned instance representations, showing the necessity of each loss and the advantage of FTCC on representation learning over BERT fine-tuning with cross-entropy.

