LEARNING TOP-K CLASSIFICATION WITH LABEL RANKING

Abstract

Class confusability and multi-label nature of examples inevitably arise in classification tasks with the increasing number of classes, which poses a huge challenge to classification. To mitigate this problem, top-k classification is proposed, where the classifier is allowed to predict k label candidates and the prediction result is considered correct as long as the ground truth label is included in the k labels. However, existing top-k classification methods neglect the ranking of the ground truth label among the predicted k labels, which has high application value. In this paper, we propose a novel three-stage approach to learn top-k classification with label ranking. We first propose an ensemble based relabeling method and relabel the training data with k labels, which is used to train the top-k classifier. We then propose a novel top-k classification loss function that aims to improve the ranking of the ground truth label. Finally, we have conducted extensive experiments on four text datasets and four image datasets, and the experimental results show that our method could significantly improve the performance of existing methods.

1. INTRODUCTION

Multi-class classification aims to classify examples into one of more than two classes, and as the number of classes increases to a large extent, e.g., thousands of classes, training a multi-class classifier will become extremely challenging due to the presence of multi-label nature of the examples and class confusability (Gupta et al., 2014; Lapin et al., 2015; Chang et al., 2017) . To mitigate this problem, the task of top-k classification is proposed (Berrada et al., 2018; Petersen et al., 2022) , where the classifier is allowed to predict k label candidates and the prediction result is considered correct as long as the ground truth label is included in the k labels. This evaluation measure is commonly referred to as the top-k error (Lapin et al., 2016) , i.e., the loss function will not penalize k -1 mistakes. Though state-of-the-art models directly trained with cross-entropy can also yield remarkable results in terms of top-k error, the data used for training must be both large and clean (Berrada et al., 2018) , which cannot be guaranteed in real scenarios. Moreover, traditional top-1 error loss function like cross-entropy may have over-fitting problem when noisy label exists (Berrada et al., 2018) . Hence, loss functions tailored for top-k error minimization are needed. However, existing top-k classification loss functions (Lapin et al., 2015; Chang et al., 2017; Berrada et al., 2018) only consider whether the ground truth label is reported in the predicted k labels and neglect the ranking of the ground truth label within the top-k candidates. In fact, ranking is crucial for tasks of top-k classification. For example, in a classic human-in-the-loop (Zanzotto, 2019) data annotation scenario, a classification model trained with a small amount of labeled data is used to predict for each unlabeled example, and humans are required to check the prediction result of a classification model and relabel those with low confidence. However, manually selecting the correct label from a large label set is time-consuming and inefficient. In this case, the model is allowed to predict the k most likely labels so that humans could easily find the ground truth label from those k labels, i.e., top-k classification. Meanwhile, improving the ranking of the ground truth label in the k labels will allow humans to get the ground truth label at the first time, which could effectively improve the efficiency of humans checking. The ranking motivation behind above scenario actually aligns with applications like recommendation system and search engine (Oosterhuis & de Rijke, 2020) . Therefore, in this paper, we aim to design a top-k classification method that can predict k most likely labels where the ground truth label is not only included but also its ranking should be as high as possible. Driven by the idea of multi-label classification (MLC) (Tsoumakas & Katakis, 2007), we propose a novel three-stage approach for top-k classification. As shown in Figure 1 , in the first stage, we use the existing training and validation sets to train the base classifier which will be used to predict k-labels for each training example in next stage. Considering that the correct k labels are crucial for top-k classification in our problem setting, we use the idea of ensemble learning (Sagi & Rokach, 2018) to train m base classifiers. Each base classifier is trained with classic cross-entropy loss function on the subset that randomly selected from training and validation sets. In the second stage, we relabel each single-label example to k-label example by using one ground truth label and k -1 most likely labels. More specifically, we predict m probability distributions p for each sample by the m base classifiers, and then average all m probability distributions to get the average probability distribution p avg . Finally, we output the most likely k -1 labels for each training sample besides its ground truth label according to the p avg . We refer to these k -1 most likely labels as pseudo-labels. In the third stage, based on the transformed k-label examples, we train a multi-label classifier which predict exactly k labels for a test example. Then, the trained multi-label classifier can be viewed as a top-k classifier. It is important to note that, we propose a new top-k loss function with label ranking (TkLR) for training in this stage. For the example with k labels, TkLR aims to maximize the difference between the scores of these k labels and the scores of other labels. To improve the ranking of the ground truth label, we embed an additional rank loss in TkLR, which aims to maximize the difference between the scores of the ground truth label and the scores of pseudo-labels. Finally, we conduct sufficient experiments on four text datasets and four image datasets with BERT (Kenton & Toutanova, 2019) and Swin Transformer (Liu et al., 2021) as the backbone model respectively. Then we evaluate the experimental results with top-k accuracy and Normalized Discounted Cumulated Gains at top K (N DCG@K) (Wang et al., 2013) . The experimental results demonstrate that our method could significantly improve the performance of existing top-k classification methods. In brief, the main contributions of this paper are summarized as follows: • We propose to consider the ranking of ground truth label in top-k classification, which has high application value but hasn't been well addressed so far. • We propose a novel three-stage approach to learn top-k classification with label ranking, which can be easily deployed with different classification models. • We propose an ensemble based relabeling method to obtain the most likely k labels for each training example which will benefit the final top-k classification. • We propose a novel top-k loss function that takes the ranking of the ground truth label into account. • The extensive experiments over different text and image datasets show that our method greatly outperforms existing baselines in terms of top-k accuracy and N DCG@K metrics. The remainder of this paper is structured as follows. In Section 2, we present the related work. Section 3 describes our approach. We describe our experiments in Section 4. Section 5 concludes the paper.



Figure 1: The proposed three-stage approach for top-k classification.

