PREDICTIVE INFERENCE WITH FEATURE CONFORMAL PREDICTION

Abstract

Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods. Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as ImageNet classification and Cityscapes image segmentation.

1. INTRODUCTION

Although machine learning models work well in numerous fields (Silver et al., 2017; Devlin et al., 2019; Brown et al., 2020) , they usually suffer from over-confidence issues, yielding unsatisfactory uncertainty estimates (Guo et al., 2017a; Chen et al., 2021; Gawlikowski et al., 2021) . To tackle the uncertainty issues, people have developed a multitude of uncertainty quantification techniques, including calibration (Guo et al., 2017b; Minderer et al., 2021) , Bayesian neural networks (Smith, 2014; Blundell et al., 2015) , and many others (Sullivan, 2015) . Among different uncertainty quantification techniques, conformal prediction (CP) stands out due to its simplicity and low computational cost properties (Vovk et al., 2005; Shafer & Vovk, 2008; Angelopoulos & Bates, 2021) . Intuitively, conformal prediction first splits the dataset into a training fold and a calibration fold, then trains a machine learning model on the training fold, and finally constructs the confidence band via a non-conformity score on the calibration fold. Notably, the confidence band obtained by conformal prediction is guaranteed due to the exchangeability assumption in the data. With such a guarantee, conformal prediction has been shown to perform promisingly on numerous realistic applications (Lei & Candès, 2021b; Angelopoulos et al., 2022) . Despite its remarkable effectiveness, vanilla conformal prediction (vanilla CP) is only deployed in the output space, which is not the only possibility. As an alternative, feature space in deep learning stands out due to its powerful inductive bias of deep representation. Take the image segmentation problem as an example. In such problems, we anticipate a predictive model to be certain in the informative regions (e.g., have clear objects), while uncertain elsewhere. Since different images would possess different object boundary regions, it is inappropriate to return the same uncertainty for different positions, as standard conformal prediction does. Nonetheless, if we instead employ conformal prediction on the more meaningful feature space, albeit all images have the same uncertainty on this intermediate space, the pixels would exhibit effectively different uncertainty in the output space after a non-trivial non-linear transformation (see Figure 3 ). In this work, we thus propose the Feature Conformal Prediction (Feature CP) framework, which deploys conformal prediction in the feature space rather than the output space (see Figure 1 ). However, there are still two issues unsolved for performing Feature CP: (a) commonly used non-conformity scores require a ground truth term, but here the ground truth in feature space is not given; and (b) transferring the confidence band in the feature space to the output space is non-trivial. To solve problem (a), we propose a new non-conformity score based on the notation surrogate feature, which replaces the ground truth term in previous non-conformity scores. As for (b), we propose two methods: Band Estimation, which calculates the upper bound of the confidence band, together with Band Detection, to determine whether a response locates in the confidence band. More interestingly, feature-level techniques are pretty general and can be deployed into other distribution-free inference algorithms, e.g., conformalized quantile regression (CQR). This shows the great potential application impact of the proposed Feature CP methodology (see the discussion in Appendix B.4). From a theoretical perspective, we demonstrate that Feature CP is provably more efficient, in the sense that it yields shorter confidence bands than vanilla CP, given that the feature space meets cubic conditions. Here the cubic conditions sketch the properties of feature space from three perspectives, including length preserving, expansion, and quantile stability (see Theorem 6). At a colloquial level, the cubic conditions assume the feature space has a smaller distance between individual non-conformity scores and their quantiles, which reduces the cost of the quantile operation. We empirically validate that the feature space in deep learning satisfies the cubic conditions, thus resulting in a better confidence band with a shorter length (See Figure 2 ) according to our theoretical analysis. Our contributions can be summarized as follows: • We propose Feature CP, together with a corresponding non-conformity score and an uncertainty band estimation method. The proposed method no longer treats the trained model as a black box but exploits the semantic feature space information. What's more, our approach could be directly deployed with any pretrained model as a plug-in component, without the need of re-training under specially designed learning criteria. • Theoretical evidence guarantees that Feature CP is both (a) efficient, where it yields shorter confidence bands, and (b) effective, where the empirical coverage provably exceeds the given confidence level, under reasonable assumptions. • We conduct extensive experiments under both synthetic and realistic settings (e.g., pixel-level image segmentation) to corroborate the effectiveness of the proposed algorithm. Besides, we demonstrate the universal applicability of our method by deploying feature-level operations to improve other adaptive conformal prediction methods such as CQR.



Figure1: Illustration of vanilla CP (left) vs Feature CP (right). Feature CP operates in the semantic feature space, as opposed to the commonly adopted output space. These methods are described in further detail in Sections 3 and 4.

availability

//github.com/AlvinWen428

