CLOSED BOUNDARY LEARNING FOR NLP CLASSIFICA-TION TASKS WITH THE UNIVERSUM CLASS

Abstract

The Universum class, often known as the other class or the miscellaneous class, is defined as a collection of samples that do not belong to any class of interest. It is a typical class that exists in many classification-based tasks in natural language processing (NLP), such as relation extraction, named entity recognition, sentiment analysis, etc. During data labeling, a significant number of samples are annotated as Universum because there are always some samples that exist in the dataset but do not belong to preset target classes and are not of interest in the task. The Universum class exhibits very different properties, namely heterogeneity and lack of representativeness in training data; however, existing methods often treat the Universum class equally with the classes of interest. Although the Universum class only contains uninterested samples, improper treatment will result in the misclassification of samples of interest. In this work, we propose a closed boundary learning method that treats the Universum class and classes of interest differently. We apply closed decision boundaries to classes of interest and designate the area outside all closed boundaries in the feature space as the space of the Universum class. Specifically, we formulate the closed boundaries as arbitrary shapes, propose a strategy to estimate the probability of the Universum class according to its unique property rather than the within-class sample distribution, and propose a boundary learning loss to learn decision boundaries based on the balance of misclassified samples inside and outside the boundary. By conforming to the natural properties of the Universum class, our method improves both accuracy and robustness of classification models. We evaluate our method on 6 state-of-the-art works in 3 different tasks, and the F1 score/accuracy of all 6 works is improved. Experimental results also indicate that our method has significantly enhanced the robustness of the model, with the largest absolute F1 score improvement of over 8% on the robustness evaluation dataset. Our code will be released on GitHub.

1. INTRODUCTION

In classification-based tasks of NLP, quite often we encounter a class named as other class, miscellaneous class, neutral class or outside (O) class. Such a class is a collection of samples that do not belong to any class of interest, such as samples of no relation class in relation extraction task. We adopt the terminology in (Weston et al., 2006) to designate all such classes as the Universum class (U). Universum class exits in various classification-based problems in NLP, such as relation extraction (RE) (Zhang et al., 2017) , named entity recognition (NER) (Tjong Kim Sang & De Meulder, 2003) , sentiment analysis (SA) (Tjong Kim Sang & De Meulder, 2003) , and natural language inference (NLI) (Bowman et al., 2015) . To distinguish the Universum class and the rest of the classes, we call the classes of interest as target classes (T). The set of all classes (A) in training and testing data can be expressed as A = U ∪ T • Universum class: A collection of samples that do not belong to any class of interest. • Target class: A class of interest in the task, i.e., one of the classes other than the Universum class. The sample compositions of the Universum class and target classes are usually very different. Figure 1 (a) provides some samples of a target class (entity-destination) and the Universum class (other) in relation extraction. From the examples, we can observe that the entity-destination samples adhere to an intrinsic pattern: an entity goes somewhere. However, the three examples of the other relation type are vastly dissimilar and do not exhibit any common intrinsic pattern. In fact, the Universum samples are labeled according to an extrinsic pattern: they do not belong to any of the predefined target classes. We further highlight the differences between the Universum class and target classes in two properties. (1) Heterogeneity: The universum class is composed of heterogeneous samples, which may form multiple clusters in the feature space of the test set, as illustrated by the green samples in Figure 1 (b) . This is because the Universum class, as the class name "other" implies, contains all potential implicit classes that are not explicitly defined in the task. For example, in the other samples given in Figure 1 Despite the substantial difference between the target classes and the Universum class, this issue has long been neglected by the NLP research community. The majority of works (Zhu & Li, 2022; Ye et al., 2022; Wan et al., 2022; Pouran Ben Veyseh et al., 2022; Tian et al., 2021; Fu et al., 2021; Li et al., 2021b) We constrain the space of target classes to be closed spaces and designate the area outside all closed boundaries in the feature space as the space of the Universum class. This treatment perfectly fits the nature of the Universum class: a sample is marked as the Universum if it does not belong to any target class during labeling. The aforementioned two properties are also well addressed in this way. The main contributions of this work are summarized as follows: • We bring attention to an important issue that is frequently neglected in classification-based NLP tasks like NER, RE, SA, and NLI: the Universum class, such as the other class, exhibits very different properties from target classes and should be treated differently. • Methodologically, we generate closed boundaries with arbitrary shape, which include the commonly used spherical-shaped boundary as a special case. In addition, we leverage the information of both target classes and the Universum class to learn the decision boundary. We propose a boundary learning loss to generate the boundary based on the balance of misclassified Universum samples and the target class samples. In contrast to the intuitive two-step pipeline method, where Universum identification and multi-class classification may lead to error propagation, we develop a strategy to estimate the probability of the Universum class without relying on its intrinsic sample distribution and learn the classification of the Universum class and target classes jointly.



treat the Universum class and target classes equally. Typically, a linear layer and a softmax function are applied at the end of the model to generate open decision boundaries, which we believe are inappropriate for tasks containing the Universum class. How to take into account the different properties of the Universum class and target classes in classifier design? In this work, we propose a closed boundary learning method for classification-based tasks with the Universum class. Traditional methods often adopt open boundary classifiers since the problem is under the closed-world assumption. However, the open decision boundaries can easily misclassify Universum samples, as illustrated in Figure 1(c). Therefore, we propose to use closed boundary classifiers as shown in Figure 1(d).

(a), implicit classes may include the entity-parallel relationship, the entity-fill relationship, and the entity-narrative relationship. Although such heterogeneous samples are easily mapped into a compact cluster for the training set, it is problematic for the test set. Since the natural predictive rule of the Universum class is an extrinsic pattern that does not belong to any target classes, the model is more likely to fit the noise in the Universum class by memorizing various peculiarities of intrinsic heterogeneous samples rather than finding the general predictive rule. Considering the data distribution of the test set differs from the training set, only memorizing various peculiarities can easily lead to overfitting and result in an accuracy drop. Moreover, the lack of robustness is another consequence of inability to find the extrinsic predictive rule for the Universum class.(2) Lack of Representativeness in Training Data: The Universum class is the complementary set of predefined target classes in the task. Therefore, it contains all possible implicit classes, i.e., classes not explicitly defined in the task but may appear in the real world. In this case, Universum samples in the training data are unable to sufficiently represent all possible patterns of the genuine distribution of the Universum class. As depicted in Figure1(b), gray samples represent Universum samples in the test set that are not represented by the training data. Classifiers with open boundaries are prone to misclassifying unseen samples in the test set that is not represented by the training data.

