EFFICIENT CONFORMAL PREDICTION VIA CASCADED INFERENCE WITH EXPANDED ADMISSION

Abstract

In this paper, we present a novel approach for conformal prediction (CP), in which we aim to identify a set of promising prediction candidates-in place of a single prediction. This set is guaranteed to contain a correct answer with high probability, and is well-suited for many open-ended classification tasks. In the standard CP paradigm, the predicted set can often be unusably large and also costly to obtain. This is particularly pervasive in settings where the correct answer is not unique, and the number of total possible answers is high. We first expand the CP correctness criterion to allow for additional, inferred "admissible" answers, which can substantially reduce the size of the predicted set while still providing valid performance guarantees. Second, we amortize costs by conformalizing prediction cascades, in which we aggressively prune implausible labels early on by using progressively stronger classifiers-again, while still providing valid performance guarantees. We demonstrate the empirical effectiveness of our approach for multiple applications in natural language processing and computational chemistry for drug discovery.

1. INTRODUCTION

The ability to provide precise performance guarantees is critical to many classification tasks (Amodei et al., 2016; Jiang et al., 2012; 2018 ). Yet, achieving perfect accuracy with only single guesses is often out of reach due to noise, limited data, insufficient modeling capacity, or other pitfalls. Nevertheless, in many applications, it can be more feasible and ultimately as useful to hedge predictions by having the classifier return a set of plausible options-one of which is likely to be correct. Consider the example of information retrieval (IR) for fact verification. Here the goal is to retrieve a snippet of text of some granularity (e.g., a sentence, paragraph, or article) that can be used to verify a given claim. Large resources, such as Wikipedia, can contain millions of candidate snippets-many of which may independently be able to serve as viable evidence. A good retriever should make precise snippet suggestions, quickly-but do so without excessively sacrificing sensitivity (i.e., recall). Conformal prediction (CP) is a methodology for placing exactly that sort of bet on which candidates to retain (Vovk et al., 2005) . Concretely, suppose we have been given n examples, (X i , Y i ) ∈ X × Y, i = 1, . . . , n, as training data, that have been drawn exchangeably from an underlying distribution P . For instance, in our IR setting, X would be the claim in question, Y a viable piece of evidence that supports or refutes it, and Y a large corpus (e.g., Wikipedia). Next, let X n+1 be a new exchangeable test example (e.g., a new claim to verify) for which we would like to predict the paired y ∈ Y. The aim of conformal prediction is to construct a set of candidates C n (X n+1 ) likely to contain Y n+1 (e.g., the relevant evidence) with distribution-free marginal coverage at a tolerance level ∈ (0, 1): P (Y n+1 ∈ C n (X n+1 )) ≥ 1 -; for all distributions P. (1) The marginal probability above is taken over all the n + 1 calibration and test points {(X i , Y i )} n+1 i=1 . A classifier is considered to be valid if the frequency of error, Y n+1 ∈ C n (X n+1 ), does not exceed . In our IR setting, this would mean including the correct snippet at least -fraction of the time. Not all valid classifiers, however, are particularly useful (e.g., a trivial classifier that merely returns all

