CURRICULUM-INSPIRED TRAINING FOR SELECTIVE NEURAL NETWORKS

Abstract

We consider the problem of training neural network models for selective classification, where the models have the reject option to abstain from predicting certain examples as needed. Recent advances in curriculum learning have demonstrated the benefit of leveraging the example difficulty scores in training deep neural networks for typical classification settings. Example difficulty scores are even more important in selective classification as a lower prediction error rate can be achieved by rejecting hard examples and accepting easy ones. In this paper, we propose a curriculum-inspired method to train selective neural network models by leveraging example difficulty scores. Our method tailors the curriculum idea to selective neural network training by calibrating the ratio of easy and hard examples in each mini-batch, and exploiting difficulty ordering at the mini-batch level. Our experimental results demonstrate that our method outperforms both the state-of-the-art and alternative methods using vanilla curriculum techniques for training selective neural network models.

1. INTRODUCTION

In selective classification, the goal is to design a predictive model that is allowed to abstain from making a prediction whenever it is not sufficiently confident. A model with this reject option is called a selective model. In other words, a selective model will reject certain examples as appropriate, and provide predictions only for accepted examples. In many real-life scenariosfoot_0 , such as medical diagnosis, robotics and self-driving cars (Kompa et al., 2021) , selective models are used to minimize the risk of wrong predictions on the hard examples by abstaining from providing any predictions and possibly seeking human intervention. In this paper, we focus on selective neural network models, which are essentially neural network models with the reject option. These models have been shown to achieve impressive results (Geifman & El-Yaniv, 2019; 2017; Liu et al., 2019) . Specifically, Geifman & El-Yaniv (2019) proposed a neural network model, SELECTIVENET, that allows end-to-end optimization of selective models. SELECTIVENET contains a main body block followed by three heads: one for minimizing the error rate among the accepted examples, one for selecting the examples for acceptance or rejection, and one for the auxiliary task of minimizing the error rate on all examples. These three heads are illustrated later in Figure 1 . The final goal of this model is to minimize the error rate among the accepted examples while satisfying a coverage constraint in terms of the least percentage of examples that need to be accepted. The coverage constraint is imposed to avoid the trivial solution of rejecting all examples to get a 0% error rate. Ideally, the model should reject hard examples and accept easy ones to lower its overall error rate. While it is clear that difficulty scores are helpful, they are typically unknown in most settings. Therefore, to leverage difficulty scores we must overcome two challenges: (1) how to obtain the difficulty scores as accurately as possible and (2) how to best utilize them in a selective neural network model. Recent curriculum learning techniques have investigated how to use example difficulty scores to improve neural network models' performance (Hacohen & Weinshall, 2019; Wu et al., 2020) Contributions. In this paper, we propose a new method for training selective neural network models inspired by curriculum learning. Our curriculum-inspired method has two benefits. First, existing training methods ignore the coverage constraint when constructing the mini-batch for each iteration, which we show will introduce misguiding noise to the training progress of selective neural network models. Taking advantage of the estimated difficulty scores, our method calibrates the ratio of easy and hard examples in each mini-batch to match the desirable value due to the coverage constraint. Second, our method also adopts the curriculum idea of increasing difficulty levels gradually over the training process, which has been demonstrated to improve neural network training (Hacohen & Weinshall, 2019; Wang et al., 2021) . However, instead of relying on existing vanilla curriculum techniques that exploit difficulty ordering at the example level, our method is tailored to selective neural network training by exploiting difficulty ordering at the mini-batch level. We will show that this change is necessary because of the need to match coverage constraint as just explained. To summarize, we make the following contributions: 1. To the best of our knowledge, we are the first to investigate the benefits of leveraging example difficulty scores to improve selective neural network model training. 2. We design a new curriculum-inspired method with two benefits for training selective neural networks: (a) calibrating the ratio of easy and hard examples in each mini-batch, which has been ignored by existing methods (Section 4.1); (b) adopting a curriculum idea which is tailored to selective neural network training by exploiting difficulty ordering at the minibatch level (Section 4.2). 3. We conduct extensive experiments demonstrating that our method improves the converged error rate (up to 13% lower) of selective neural network models compared to the state-ofthe-art (Section 5.2). We also show that our method is better than alternative designs using vanilla techniques from existing curriculum learning literature (Section 5.3). 



These are further elaborated in Section A.1 in the appendix.



. To the best of our knowledge, these techniques only consider the typical classification setting where the error rate on all examples should be minimized. Curriculum learning techniques often use a scoring function to estimate the difficulty scores. There are different approaches to constructing a scoring function from a reference model, such as (1) confidence score (Hacohen & Weinshall, 2019), (2) learned epoch/iteration (Wu et al., 2020), and (3) estimated c-score (Jiang et al., 2020). Prior work has shown that these scoring functions are highly correlated with one another and lead to similar performance (Wu et al., 2020). Existing curriculum learning techniques use the estimated difficulty scores only to decide the order of examples being exposed to the model during the training process. Typically, they expose easy examples to the model during the early phase of the training process, and gradually transition to the hard examples as the training progresses. In this paper, we consider selective classification, which has a more relaxed goal: minimize the error rate on the accepted examples for a given coverage constraint. Difficulty scores are even more useful in selective classification, because the selection of accepted examples has a direct impact on the model's final performance (i.e., a lower error rate can be achieved by rejecting hard examples and accepting easy ones). This inspires us to design a new method for training selective neural network models.

Selective classification. Prior work on selective classification primarily focuses on adding the reject option to classical learning algorithms such as SVM(Fumera & Roli, 2002), nearest neighbors(Hellman, 1970), boosting (Cortes et al., 2016)  and online learning methods(Cortes et al., 2018). In particular, (Geifman & El-Yaniv, 2017) applies selective classification techniques in the context of deep neural networks (DNNs). They show how to construct a selective classifier given a trained neural network model. They decide whether or not to reject each example based on a confidence score. They rely on two techniques for extracting confidence scores from a neural network model: Softmax Response (SR), and Monte-Carlo dropout (MC-dropout). SR is the maximal activation in the softmax layer for a classification model, which is used as the confidence score. MC-dropout estimates the confidence score based on the statistics of numerous forward passes through the network with the dropout applied. Unfortunately, MC-dropout requires hundreds of forward passes for each example, incurring a massive computational overhead. More recently, Geifman & El-Yaniv (2019) proposes a selective neural network that jointly learns a predictive function and a selection function. This model is trained end-to-end, resulting in a selective model that is optimized over the covered domain. They show empirically that this selective neural network outperforms previous methods based on SR or MC-dropout. In addition, inspired by portfolio theory,Liu et al. (2019)  propose a

