CONDITIONAL COVERAGE ESTIMATION FOR HIGH-QUALITY PREDICTION INTERVALS Anonymous

Abstract

Deep learning has achieved state-of-the-art performance to generate high-quality prediction intervals (PIs) for uncertainty quantification in regression tasks. The high-quality criterion requires PIs to be as narrow as possible, whilst maintaining a pre-specified level of data (marginal) coverage. However, most existing works for high-quality PIs lack accurate information on conditional coverage, which may cause unreliable predictions if it is significantly smaller than the marginal coverage. To address this problem, we propose a novel end-to-end framework which could output high-quality PIs and simultaneously provide their conditional coverage estimation. In doing so, we design a new loss function that is both easyto-implement and theoretically justified via an exponential concentration bound. Our evaluation on real-world benchmark datasets and synthetic examples shows that our approach not only outperforms the state-of-the-arts on high-quality PIs in terms of average PI width, but also accurately estimates conditional coverage information that is useful in assessing model uncertainty.

1. INTRODUCTION

Prediction interval (PI) is poised to play an increasingly prominent role in uncertainty quantification for regression tasks (Khosravi et al., 2010; 2011; Galván et al., 2017; Rosenfeld et al., 2018; Tagasovska & Lopez-Paz, 2018; 2019; Romano et al., 2019; Wang et al., 2019; Kivaranovic et al., 2020) . A high-quality PI should be as narrow as possible, whilst maintaining a pre-specified level of data coverage or marginal coverage (Pearce et al., 2018) . Compared with PIs obtained based on coverage-only consideration, the "high-quality" criterion is beneficial in balancing between marginal coverage probability and interval width. However, the conditional coverage given a feature, which is critical for making reliable context-based decisions, is unassessed and missing in most existing works on high-quality PIs. In the presence of heteroskedasticity and model misspecification, the marginal coverage can be very different from the conditional coverage at a given point, which affects the downstream decision-making task that relies on the uncertainty information provided by the PI. Our main goal is to meaningfully incorporate and assess conditional coverages in high-quality PIs. Conditional coverage estimation is challenging for two reasons. First is that the natural evaluation metric of conditional coverage error, an L p distance between the estimated and ground-truth conditional coverages, is difficult to compute as it requires obtaining the conditional probability given feature x, which is arguably as challenging as the regression problem itself. Our first goal in this paper is to address this issue by developing a new metric called calibration-based conditional coverage error for conditional coverage estimation measurement. Our approach is inspired from the calibration notion in classification (Guo et al., 2017) . The basic idea is to relax conditional coverage at any given point to being averaged over all points that bear the same estimated value. An estimator satisfying the relaxed property is regarded as well-calibrated. In regression, calibration-based conditional coverage error provides a middle ground between the enforcement of marginal coverage (lacking any conditional information) and conditional coverage (computationally intractable). Compared with conditional coverage, this middle-ground metric can be viewed as a "dimension reduction" of the conditioning variable from the original sample space to the space [0, 1], so that we can easily discretize to compute the empirical metric values. The second challenge is the discontinuity in the above metrics that hinders efficient training of PIs that are both high-quality and possess reliable conditional coverage information. To address this, we design a new loss function based on a combination of the high-quality criterion and a coverage assessment loss. The latter can be flexibly added as a separate module to any neural network (NN) used to train PIs. It is based on an empirical version of a tight upper bound on the coverage error in terms of a Kullback-Leibler (KL) divergence, which can be readily employed for running gradient descent. We theoretically show how training with our proposed loss function attains this upperbounding value via a concentration bound. We also demonstrate the empirical performance of our approach in terms of PI quality and conditional coverage assessment compared with benchmark methods.

Summary of Contributions:

(1) We identify the conditional coverage estimation problem as a new challenge for high-quality PIs and introduce a new evaluation metric for coverage estimation. (2) We propose an end-to-end algorithm that can simultaneously construct high-quality PIs and generate conditional coverage estimates. In addition, we provide theoretical justifications on the effectiveness of our algorithm by developing concentration bounds relating the coverage assessment loss and conditional coverage error. (3) By evaluating on benchmark datasets and synthetic examples, we empirically demonstrate that our approach not only achieves high performance on conditional coverage estimation, but also outperforms the state-of-the-art algorithms on high-quality PI generation.

2. EVALUATING CONDITIONAL COVERAGE FOR HIGH-QUALITY PIS

Let X ∈ X and Y ∈ Y ⊂ R be random variables denoting the input feature and label, where the pair (X, Y ) follows an (unknown) ground-truth joint distribution π(X, Y ). Let π(Y |X) be the conditional distribution of Y given X. We are given the training data D := {(x i , y i ), i = 1, 2, • • • , n} where (x i , y i ) are i.i.d. realizations of random variables (X, Y ). A PI refers to an interval [L(x), U (x)] where L, U are two functions mapping from X to Y trained on the data D. [L(x), U (x)] is called a PI at prediction level 1 -α (0 ≤ α ≤ 1) if its marginal coverage is not less than 1 -α, i.e., P[Y ∈ [L(X), U (X)]|L, U ] ≥ 1 -α where P is with respect to a new test point (X, Y ) ∼ π. We say that [L(x), U (x)] is of high-quality if its marginal coverage attains a pre-specified target prediction level and has a short width on average. In particular, a best-quality PI at prediction level 1 -α is an optimal solution to the following constrained optimization problem: min L,U E[U (X) -L(X)] subject to P[Y ∈ [L(X), U (X)]|L, U ] ≥ 1 -α. (2.1) The high-quality criterion has been widely adopted in previous work (see Section 6). However, this criterion alone may fail to carry important model uncertainty information at specific test points. Consider a simple example where x ∼ Uniform[0, 1], y = 0 for x ∈ [0, 0.95] and y|x ∼ Uniform[0, 1] for x ∈ (0.95, 1]. Then according to equation 2.1, a best-quality 95% PI is precisely L(x) = U (x) = 0 for all x ∈ [0, 1]. This PI has nonconstant coverage if we condition at different points (1 for x ∈ [0, 0.95] and 0 for x ∈ (0.95, 1]), and can deviate significantly from the overall coverage 95%. More examples to highlight the need of obtaining conditional coverage information can be found in our numerical experiments in Section 5.1. To mitigate the drawback of the high-quality criterion, we define: Definition 2.1 (Conditional Coverage and Its Estimator). The conditional coverage associated with a PI [L(x), U (x)] is A(x) := P[Y ∈ [L(X), U (X)]|L, U, X = x] for a.e. x ∈ X , where P is taken with respect to π(Y |X). For a (conditional) coverage estimator P , which is a measurable function from X to [0, 1], we define its L p conditional coverage error ( CE p ) as CE p := P[Y ∈ [L(X), U (X)]|L, U, X] -P (X) L p (X ) where the L p -norm is taken with respect to the randomness of X (1 ≤ p ≤ +∞). Note that evaluating CE p relies on approximating the conditional coverage A(x), which can be as challenging as the original prediction problem. To address this, we leverage the similarity of estimating A(x) to generating prediction probabilities in binary classification, which motivates us to

