UNCOVERING THE EFFECTIVENESS OF CALIBRATION ON OPEN INTENT CLASSIFICATION Anonymous authors Paper under double-blind review

Abstract

Open intent classification aims to simultaneously identify known and unknown intents, and it is one of the challenging tasks in modern dialogue systems. While prior approaches are based on known intent classifiers trained under the crossentropy loss, we presume this loss function yields a representation overly biased to the known intents; thus, it negatively impacts identifying unknown intents. In this study, we propose a novel open intent classification approach that utilizes model calibration into the previously-proposed state-of-the-art. We empirically examine that simply changing a learning objective in a more calibrated manner outperforms the past state-of-the-art. We further excavate that the underlying reason behind calibrated classifier's supremacy derives from the high-level layers of the deep neural networks. We also discover that our approach is robust to harsh settings where few training samples per class exist. Consequentially, we expect our findings and takeaways to exhibit practical guidelines of open intent classification, thus helping to inform future model design choices.

1. INTRODUCTION

Background and Motivation Beyond the success of intent classification under the supervised regime, one of the next challenges in the modern dialogue system is open intent classification (Scheirer et al., 2013) . While the number of intents in the training and test sets is the same under the supervised setting (known as a closed-set setting), an intent classifier in the real world is required to recognize unknown intents as well as known intents (Zhang et al., 2021) . For example, supposing the training set includes N intents, the open intent classification solves N + 1 classification where the N + 1 th intent is a set of unknown ones (Shu et al., 2017; Lin & Xu, 2019; Zhang et al., 2021) . This open intent classification task is also related to open world recognition (Bendale & Boult, 2016; Vaze et al., 2021) or out-of-distribution detection studies (Hendrycks & Gimpel, 2016; Liang et al., 2017) which are actively dealt with image domains, but it is specifically denoted as open intent classification in a natural language processing domain. Upon previously-proposed open intent classification methods, we figure out that most of these works conventionally trained the closed-set classifier with a cross-entropy loss (Bendale & Boult, 2016; Hendrycks & Gimpel, 2016; Prakhya et al., 2017; Shu et al., 2017; Lin & Xu, 2019; Zhang et al., 2021) . However, we doubt whether this use of cross-entropy loss is the utmost learning objective for identifying open intents. Previous open intent classification study highlighted that adequate strength of decision boundaries among known intents is important for detecting unknown intents (Zhang et al., 2021) . To interpret, an inductive bias established with known intents should be neither overly biased nor too loosely optimized. Not only in open intent classification but recently-proposed state-of-the-art open world classification study in the computer vision domain also supports this proposition: acquiring adequate representation power correlates to effective open world classification performance (Vaze et al., 2021) . But, as several works once pointed out, the cross-entropy loss is known to convey an inductive bias that is excessively biased to the given labels because it enforces the model to select one single label among the given label space (Recht et al., 2019; Zhang et al., 2016) . To this end, we assume the use of cross-entropy loss has room for improvement and aims to provide an outperforming open intent classifier. Main Idea and Its Novelty Our work's key proposition is utilizing model calibration during the model training on known intents. Model calibration is a method that adjusts a model's predicted probabilities of outcomes to reflect the true probabilities of those outcomes (Nixon et al., 2019) . Referring to the calibration studies, the calibrated deep neural networks accomplished robustness against various noises and perturbations (Müller et al., 2019; Pereyra et al., 2017) . Inspired by this finding, we presume that applying calibration to the cross-entropy loss will improve the inductive bias's quality and escalate the open intent classification performance. Accordingly, we select state-of-the-art open-world classification methods in the text and image domains and simply apply calibration to their training procedure. Throughout our work, we firstly showed whether the calibration improves inductive bias compared to the cross-entropy loss. Then, we further examine whether our simple idea can outperform previous open intent classifiers in various problem settings and how calibration changes the representation landscape in the trained model. Although our idea seems to be simple, we highlight that the proposed open intent classifiers are novel because, to the best of our knowledge, our approach is the first attempt to utilize calibration to improve open intent classification performance in the text domain.

Key Contributions

• As a preliminary analysis, we show that model calibration reduces the bias of the conventional known intent classifier, as well as escalates the distribution discrepancy between known and unknown intents. We analyze that this large discrepancy would contribute to better open intent classification performances • We further scrutinize that the supremacy of C-LC and C-ADB derives from the representations at higher layers of the deep neural networks. We interpret that the proposed methods acquire better contextual understandings than the previously-proposed methods. • Lastly, we examine our approaches' stability in extreme settings of the training set. We discover that C-ADB is less stable than C-LC given few training samples per known intent; thus, there should be careful consideration on using C-ADB. 



We propose two novel methods in open intent classification, C-LC and C-ADB, by applying model calibration to the previously proposed state-of-the-art in image and text domains, respectively. Under the particular settings, we discover the proposed methods become a new state-of-the-art.

Open Intent Classification At first, Scheirer et al. (2013) defined the task of open-set recognition in the computer vision domain and inspired substudies. Fei & Liu (2016) applied SVM with centerbased similarity to solve open set classification. Bendale & Boult (2016) preposed OpenMax model, which fits Weibull distribution in the penultimate layer of the network. Prakhya et al. (2017) and Shu et al. (2017) adopt the OpenMax model in open intent classification and show that convolutional neural networks are good feature extractors in NLP domain. Hendrycks & Gimpel (2016) suggest that predicting out-of-distribution example can be distinguished based on softmax probability. Subsequently, post-processing-based methods were proposed. Lin & Xu (2019) apply Large Margin Cosine Loss to the post-processing-based method; the model learns that it maximizes inter-class variance and minimizes intra-class variance. Zhang et al. (2021) introduce learning adaptive decision boundary(ADB) and centroid for open intent classification, which is a post-processing-based method. Shu et al. (2021) use several data augmentation strategies to expand distribution shift examples on ADB. Throughout the prior works, we analyze that a key takeaway of the precise open intent classification method is establishing adequate decision boundaries among the known intent samples; it is also usually denoted as appropriate tightness of decision boundaries(Zhang et al.,  2021). As the aforementioned methods commonly employed cross-entropy loss to train the known intent classifier, we hypothesize it is not advantageous in establishing good decision boundaries. Under this motivation, we aim to scrutinize optimal decision boundaries' tightness through applying model calibration to the know intent classifier.Model Calibration Calibration reflects ground truth correctness likelihood in a predicted class label. Good calibrated confidence provides suitable information on why the neural network prediction is made.Guo et al. (2017)  proposed temperature scaling to calibrate modern neural networks with over-confident problems.Lee et al. (2017)  suggests two additional terms on the original objective function for detecting out-of-distribution. Research of calibration has been widely examined

