LEARNING WITH INSTANCE-DEPENDENT LABEL NOISE: MAINTAINING ACCURACY AND FAIRNESS

Abstract

Incorrect labels hurt model performance when the model overfits to noise. Many state-of-the-art approaches that address label noise assume that label noise is independent from the input features. In practice, however, label noise is often feature or instance-dependent, and therefore is biased (i.e., some instances are more likely to be mislabeled than others). Approaches that ignore this dependence can produce models with poor discriminative performance, and depending on the task, can exacerbate issues around fairness. In light of these limitations, we propose a two-stage approach to learn from datasets with instance-dependent label noise. Our approach utilizes alignment points, a small subset of data for which we know the observed and ground truth labels. On many tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AU-ROC) while maintaining model fairness (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, the harmonic mean of the AUROC and AUEOC of our approach is 0.84 (SD 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach leads to learning more accurate and fair models compared to existing approaches in the presence of instance-dependent label noise.

1. INTRODUCTION

Datasets used to train machine learning models can contain incorrect labels (i.e., label noise). While label noise is widely studied, the majority of past work focuses on when the noise is independent from an instance's features (i.e., instance-independent label noise) [Song et al. (2020) ]. However, label noise is sometimes biased and depends on an instance's features (i.e., instance-dependent) [Wei et al. (2022b) ], leading to different noise rates within subsets of the data. This results in model overfitting, and in tasks where the dataset contains instances from different groups corresponding to some sensitive attribute, this can also lead to disparities in performance [Liu (2021) ]. For example, consider the task of predicting cardiovascular disease among patients admitted to a hospital. Compared to male patients, female patients may be more likely to be misdiagnosed [Maserejian et al. (2009) ] and thus mislabeled, potentially leading to worse predictions for female patients. Although instance-dependent label noise has recently received more attention [Cheng et al. (2020b); Xia et al. (2020) ; Wang et al. (2021a) ], the effect of these approaches on model fairness has been relatively understudied [Liu (2021) ]. Here, we address the limitations of current approaches and propose a novel method for learning with instance-dependent label noise, specifically examining how modeling assumptions affect existing issues around model fairness. Broadly, current work addressing instance-dependent label noise falls into one of two categories: 1) that which learns to identify mislabeled instances [Cheng et al. (2020a); Xia et al. (2022); Zhu et al. (2022a) ], and 2) that which learns to optimize a noise-robust objective function [Feng et al. (2020); Wei et al. (2022a) ]. In the first category, instances identified as mislabeled are either filtered out [Kim et al. (2021) ] or relabeled [Berthon et al. (2021) ]. In some settings, this approach can have a negative effect on model fairness. For example, when instances represent individuals belonging to subgroups defined by a sensitive attribute, approaches that filter out mislabeled individuals could ignore a disproportionately higher number of individuals from subgroups with more label noise. While relabeling approaches use all available data, they can be sensitive to assumptions around the noise distribution [Ladouceur et al. (2007) ]. In the second category, current approaches rely on objective functions that are less prone to overfitting to the noise, while using all of the data and In light of these limitations, we propose an approach to address instance-dependent label noise, makes no assumptions about the noise distribution, and uses all data during training. We leverage a set of representative points in which we have access to both the observed and ground truth labels. While past work has used the observed labels as the ground truth labels for anchor points [Xia et al. (2019) ; Wu et al. ( 2021)], we consider a different setting in which the ground truth and observed labels do not agree for some points. To make the differentiation clear, we refer to these points as 'alignment points'. Such a setting arises frequently in healthcare. Oftentimes, one labels an entire dataset using a proxy function (obtaining observed labels) but also labels a small subset of the data using manual review (obtaining ground truth labels). We use them to initialize the model's decision boundary and learn the underlying pattern of label noise in pre-training. We then add the remaining data for fine-tuning, minimizing a weighted cross-entropy loss based on the learned noise pattern. On synthetic and real data, we evaluate our approach in terms of discriminative performance and model fairness, measured using the area under the receiver operator curve (AUROC) and area under the equalized odds curve (AUEOC). We demonstrate that our approach improves on state-ofthe-art baselines from the noisy labels and fairness literature, such as stochastic label noise [Chen et al. ( 2021)] and group-based peer loss [Wang et al. (2021b) ]. Overall, our contributions are: 1) a novel approach to learn from datasets with instance-dependent noise; 2) a systematic examination of different settings of label noise, showing where approaches fail with respect to discriminative performance and fairness; 3) empirical results showing that the proposed approach is robust to both to the noise rate and amount of noise disparity between subgroups, reporting the model's ability to maintain discriminative performance and fairness; 4) a demonstration of how performance of the proposed approach changes when assumptions about the alignment set are violated.

2. METHODS

We introduce a two-stage approach for learning with instance-dependent label noise that leverages a small set of alignment points for which we have both observed and ground truth labels. Notation and Problem Setup Our main notation is in Table 1 . We train a model using dataset D = {x (i) , ỹ(i) } n i=1 , where x ∈ R d and y ∈ {1, 2, ..., c} to learn a function f : x → y that can map unseen instances into one of c classes based on their feature vectors. In the presence of noisy labels, ỹ is not always equal to y. For each alignment point, we know both ỹ and y. This is in contrast to the rest of the data, where we only know ỹ. We aim to learn model parameters, θ, such that θ(x) represents the predicted class probabilities, (i.e., ŷ). Alignment points are similar to anchor points [Xia et al. (2019) ], but we do not assume that ỹ(i) = y (i) for these points. For the rest of the paper, we focus on the following case of instance-dependent noise. Let f be the function used to generate the ground truth labels (i.e., f (x) = y), and let m be the function used to generate an instance's risk of being mislabeled (i.e., y = ỹ if m(x) is above some threshold). In the following toy example, suppose f (x) = 1 for true positive instances, and m(x) = 1 if x 1 > 0.5 and 0 otherwise. Here, instances where x 1 > 0.5 have noisy labels. Although f and m were deterministic for simplicity, they can also be based on probabilistic functions. We denote the set of alignment points (i.e., the alignment set) as set A consisting of instances i such that both ỹ(i)



Notation. We summarize our notation, with the notation appearing in the left column and a description in the right column. Superscripts in parentheses represent specific instances (e.g., x (i) ). Subscripts represents indexes into a vector (e.g., x i ) Chen et al. (2021)]. However, like the first category, many rely on assumptions like the memorization effect, and thus, potentially suffer from the same limitations[Wang et al. (2021a)].

