BATCH MULTIVALID CONFORMAL PREDICTION

Abstract

We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership-that is, the target coverage level 1 -α holds conditionally on membership in each of an arbitrary (potentially intersecting) group in a finite collection G of regions in the feature space. (2) They hold even conditional on the value of the threshold used to produce the prediction set on a given example. In fact multivalid coverage guarantees hold even when conditioning on group membership and threshold value simultaneously. We give two algorithms: both take as input an arbitrary non-conformity score and an arbitrary collection of possibly intersecting groups G, and then can equip arbitrary black-box predictors with prediction sets. Our first algorithm BatchGCP is a direct extension of quantile regression, needs to solve only a single convex minimization problem, and produces an estimator which has group-conditional guarantees for each group in G. Our second algorithm BatchMVP is iterative, and gives the full guarantees of multivalid conformal prediction: prediction sets that are valid conditionally both on group membership and non-conformity threshold. We evaluate the performance of both of our algorithms in an extensive set of experiments.

1. INTRODUCTION

Consider an arbitrary distribution D over a labeled data domain Z = X × Y. A model is any function h : X → Y for making point predictions. The traditional goal of conformal prediction in the "batch" setting is to take a small calibration dataset consisting of labeled examples sampled from D and use it to endow an arbitrary model h : X → Y with prediction sets T h (x) ⊆ Y that have the property that these prediction sets cover the true label with probability 1 -α marginally for some target miscoverage rate α: Pr (x,y)∼D [y ∈ T h (x)] = 1 -α. This is a marginal coverage guarantee because the probability is taken over the randomness of both x and y, without conditioning on anything. In the batch setting (unlike in the sequential setting), labels are not available when the prediction sets are deployed. Our goal in this paper is to give simple, practical algorithms in the batch setting that can give stronger than marginal guarantees -the kinds of multivalid guarantees introduced by Gupta et al. ( 2022 Following the literature on conformal prediction (Shafer and Vovk, 2008) , our prediction sets are parameterized by an arbitrary non-conformity score s h : Z → R defined as a function of the model h. Informally, smaller values of s h (x, y) should mean that the label y "conforms" more closely to the prediction h(x) made by the model. For example, in a regression setting in which Y = R, the simplest non-conformity score is s h (x, y) = |h(x) -y|. By now there is a large literature giving more sophisticated non-conformity scores for both regression and classification problemssee Angelopoulos and Bates (2021) for an excellent recent survey. A non-conformity score function s h (x, y) induces a distribution over non-conformity scores, and if τ is the 1 -α quantile of this distribution (i.e. Pr (x,y)∼D [s h (x, y) ≤ τ ] = 1 -α), then defining prediction sets as T τ h (x) = {y : s h (x, y) ≤ τ ] gives 1 -α marginal coverage. Split conformal prediction (Papadopoulos et al., 2002; Lei et al., 2018) simply finds a threshold τ that is an empirical 1 -α quantile on the calibration set, and then uses this to deploy the prediction sets T τ h (x) defined above. Our goal is to give stronger coverage guarantees, and to do so, rather than learning a single threshold τ from the calibration set, we will learn a function f : X → R mapping unlabeled examples to thresholds. Such a mapping f induces prediction sets defined as follows: T f h (x) = {y : s h (x, y) ≤ f (x)}. Our goal is to find functions f : X → R that give valid conditional coverage guarantees of two sorts. Let G be an arbitrary collection of groups: each group g ∈ G is some subset of the feature domain g ∈ 2 X about which we make no assumption and we write g(x) = 1 to denote that x is a member of group g. An example x might be a member of multiple groups in G. We want to learn a function f that induces group conditional coverage guarantees-i.e. such that for every g ∈ G: Pr (x,y)∼D [y ∈ T f h (x)|g(x) = 1] = 1 -α. Here we can think of the groups as representing e.g. demographic groups (broken down by race, age, gender, etc) in settings in which we are concerned about fairness, or representing any other categories that we think might be relevant to the domain at hand. Since our functions f now map different examples x to different thresholds f (x), we also want our guarantees to hold conditional on the chosen threshold-which we call a threshold calibrated guarantee. This avoids algorithms that achieve their target coverage rates by overcovering for some thresholds and undercovering with others -for example, by randomizing between full and empty prediction sets. That is, we have: Pr (x,y)∼D [y ∈ T f h (x)|g(x) = 1, f (x) = τ ] = 1 -α simultaneously for every g ∈ G and every τ ∈ R. If f is such that its corresponding prediction sets T f h (x) satisfy both group and threshold conditional guarantees simultaneously, then we say that it promises full multivalid coverage.

1.1. OUR RESULTS

We design, analyze, and empirically evaluate two algorithms: one for giving group conditional guarantees for an arbitrary collection of groups G, and the other for giving full multivalid coverage guarantees for an arbitrary collection of groups G. We give PAC-style guarantees (Park et al., 2019) , which means that with high probability over the draw of the calibration set, our deployed prediction sets have their desired coverage properties on the underlying distribution. Thus our algorithms also offer "training-conditional coverage" in the sense of Bian and Barber (2022) . We prove our generalization theorems under the assumption that the data is drawn i.i.d. from some distribution, but note that De Finetti's theorem (Ressel, 1985) implies that our analysis carries over to data drawn from any exchangeable distribution (see Remark C.1). Group Conditional Coverage: BatchGCP We first give an exceedingly simple algorithm BatchGCP (Algorithm 1) to find a model f that produces prediction sets T f h that have group conditional (but not threshold calibrated) coverage guarantees. We consider the class of functions F = {f λ : λ ∈ R |G| }: each f λ ∈ F is parameterized by a vector λ ∈ R |G| , and takes value: f λ (x) = f 0 (x) + g∈G:g(x)=1 λ g . Here f 0 is some arbitrary initial model. Our algorithm simply finds the parameters λ that minimize the pinball loss of f λ (x). This is a |G|-dimensional convex optimization problem and so can be solved efficiently using off the shelf convex optimization methods. We prove that the resulting function f λ (x) guarantees group conditional coverage. This can be viewed as an extension of conformalized quantile regression (Romano et al., 2019) which is also based on minimizing pinball loss. It can also be viewed as an algorithm promising "quantile multiaccuracy", by analogy to (mean) multiaccuracy introduced in Hébert-Johnson et al. ( 2018); Kim et al. ( 2019), and is related to similar algorithms for guaranteeing multiaccuracy (Gopalan et al., 2022b) . Here pinball loss takes the role that squared loss does in (mean) multiaccuracy. Multivalid Coverage: BatchMVP We next give a simple iterative algorithm BatchMVP (Algorithm 2) to find a model f that produces prediction sets T f h that satisfy both group and threshold conditional guarantees simultaneously -i.e. full multivalid guarantees. It iteratively finds groups g ∈ G and thresholds τ such that the current model fails to have the target coverage guarantees conditional on g(x) = 1 and f (x) = τ , and then "patches" the model so that it does. We show that each patch improves the pinball loss of the model substantially, which implies fast convergence. This can be viewed as an algorithm for promising "quantile multicalibration" and is an extension of related algorithms for guaranteeing mean multicalibration (Hébert-Johnson et al., 2018) , which offer similar guarantees for mean (rather than quantile) prediction. Once again, pinball loss takes the role that squared loss takes in the analysis of mean multicalibration.



);Bastani et al. (2022)  in the sequential prediction setting.

