CONFORMAL PREDICTION IS ROBUST TO LABEL NOISE

Abstract

We study the robustness of conformal prediction-a powerful tool for uncertainty quantification-to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. With both theory and experiments, we argue that conformal prediction with noisy labels conservatively covers the clean ground truth labels except in adversarial cases. This leads us to believe that correcting for label noise is unnecessary except for pathological data distributions or noise sources. In such cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure correct coverage of the ground truth labels without score or data regularity.

1. INTRODUCTION

In most supervised classification and regression tasks, one would assume the provided labels reflect the ground truth. In reality, this assumption is often violated; see (Cheng et al., 2022; Xu et al., 2019; Yuan et al., 2018; Lee & Barber, 2022; Cauchois et al., 2022) . For example, doctors labeling the same medical image may have different subjective opinions about the diagnosis, leading to variability in the ground truth label itself. In other settings, such variability may arise due to sensor noise, data entry mistakes, the subjectivity of a human annotator, or many other sources. In other words, the labels we use to train machine learning (ML) models may often be noisy in the sense that these are not necessarily the ground truth. Quantifying the prediction uncertainty is crucial in highstakes applications in general, and especially so in settings where the training data is inexact. We aim to investigate uncertainty quantification in this challenging noisy setting via conformal prediction, a framework that uses hold-out calibration data to construct prediction sets that are guaranteed to contain the ground truth labels; see (Vovk et al., 2005; Angelopoulos & Bates, 2021) . In short, this paper shows that conformal prediction typically yields confidence sets with conservative coverage when the hold-out calibration data has noisy labels. We adopt a variation of the standard conformal prediction setup. Consider a calibration data set of i.i.d. observations {(X i , Y i )} n i=1 sampled from an arbitrary unknown distribution P XY . Here, X i ∈ R p is the feature vector that contains p features for the ith sample, and Y i denotes its response, which can be discrete for classification tasks or continuous for regression tasks. Given the calibration dataset, an i.i.d. test data point (X test , Y test ), and a pre-trained model f , conformal prediction constructs a set C(X test ) that contains the unknown test response, Y test , with high probability, e.g., 90%. That is, for a user-specified level α ∈ (0, 1), P Y test ∈ C(X test ) ≥ 1 -α. ( ) This property is called marginal coverage, where the probability is defined over the calibration and test data. In the setting of label noise, we only observe the corrupted labels Ỹi = g(Y i ) for some corruption function g : Y×[0, 1] → Y, so the i.i.d. assumption and marginal coverage guarantee are invalidated. The corruption is random; we will always take the second argument of g to be a random seed U uniformly distributed on [0, 1]. To ease notation, we leave the second argument implicit henceforth. Nonetheless, using the noisy calibration data, we seek to form a prediction set C noisy (X test ) that covers the clean, uncorrupted test label, Y test . More precisely, our goal is to delineate when it is 1

