LEARNING TOP-K CLASSIFICATION WITH LABEL RANKING

Abstract

Class confusability and multi-label nature of examples inevitably arise in classification tasks with the increasing number of classes, which poses a huge challenge to classification. To mitigate this problem, top-k classification is proposed, where the classifier is allowed to predict k label candidates and the prediction result is considered correct as long as the ground truth label is included in the k labels. However, existing top-k classification methods neglect the ranking of the ground truth label among the predicted k labels, which has high application value. In this paper, we propose a novel three-stage approach to learn top-k classification with label ranking. We first propose an ensemble based relabeling method and relabel the training data with k labels, which is used to train the top-k classifier. We then propose a novel top-k classification loss function that aims to improve the ranking of the ground truth label. Finally, we have conducted extensive experiments on four text datasets and four image datasets, and the experimental results show that our method could significantly improve the performance of existing methods.

1. INTRODUCTION

Multi-class classification aims to classify examples into one of more than two classes, and as the number of classes increases to a large extent, e.g., thousands of classes, training a multi-class classifier will become extremely challenging due to the presence of multi-label nature of the examples and class confusability (Gupta et al., 2014; Lapin et al., 2015; Chang et al., 2017) . To mitigate this problem, the task of top-k classification is proposed (Berrada et al., 2018; Petersen et al., 2022) , where the classifier is allowed to predict k label candidates and the prediction result is considered correct as long as the ground truth label is included in the k labels. This evaluation measure is commonly referred to as the top-k error (Lapin et al., 2016) , i.e., the loss function will not penalize k -1 mistakes. Though state-of-the-art models directly trained with cross-entropy can also yield remarkable results in terms of top-k error, the data used for training must be both large and clean (Berrada et al., 2018) , which cannot be guaranteed in real scenarios. Moreover, traditional top-1 error loss function like cross-entropy may have over-fitting problem when noisy label exists (Berrada et al., 2018) . Hence, loss functions tailored for top-k error minimization are needed. However, existing top-k classification loss functions (Lapin et al., 2015; Chang et al., 2017; Berrada et al., 2018) only consider whether the ground truth label is reported in the predicted k labels and neglect the ranking of the ground truth label within the top-k candidates. In fact, ranking is crucial for tasks of top-k classification. For example, in a classic human-in-the-loop (Zanzotto, 2019) data annotation scenario, a classification model trained with a small amount of labeled data is used to predict for each unlabeled example, and humans are required to check the prediction result of a classification model and relabel those with low confidence. However, manually selecting the correct label from a large label set is time-consuming and inefficient. In this case, the model is allowed to predict the k most likely labels so that humans could easily find the ground truth label from those k labels, i.e., top-k classification. Meanwhile, improving the ranking of the ground truth label in the k labels will allow humans to get the ground truth label at the first time, which could effectively improve the efficiency of humans checking. The ranking motivation behind above scenario actually aligns with applications like recommendation system and search engine (Oosterhuis & de Rijke, 2020). 1

