PIVEN: A DEEP NEURAL NETWORK FOR PREDIC-TION INTERVALS WITH SPECIFIC VALUE PREDICTION

Abstract

Improving the robustness of neural nets in regression tasks is key to their application in multiple domains. Deep learning-based approaches aim to achieve this goal either by improving their prediction of specific values (i.e., point prediction), or by producing prediction intervals (PIs) that quantify uncertainty. We present PIVEN, a deep neural network for producing both a PI and a prediction of specific values. Unlike previous studies, PIVEN makes no assumptions regarding data distribution inside the PI, making its point prediction more effective for various real-world problems. Benchmark experiments show that our approach produces tighter uncertainty bounds than the current state-of-the-art approach for producing PIs, while maintaining comparable performance to the state-of-the-art approach for specific value-prediction. Additional evaluation on large image datasets further support our conclusions.

1. INTRODUCTION

Deep neural networks (DNNs) have been achieving state-of-the-art results in a large variety of complex problems. These include automated decision making and recommendation systems in the medical domain (Razzak et al., 2018) , autonomous control of drones (Kaufmann et al., 2018) and self driving cars (Bojarski et al., 2016) . In many of these domains, it is crucial not only that the prediction made by the DNN is accurate, but rather that its uncertainty is quantified. Quantifying uncertainty has many benefits, including risk reduction and more reliable planning (Khosravi et al., 2010) . In regression, uncertainty is quantified using prediction intervals (PIs), which offer upper and lower bounds on the value of a data point for a given probability (e.g., 95%). Existing non-Bayesian PI generation methods can be roughly divided into two groups: a) performing multiple runs of the regression problem, as in dropout (Gal & Ghahramani, 2016) or ensemble-based methods (Lakshminarayanan et al., 2017) , then deriving post-hoc the PI from prediction variance, and; b) dedicated architectures for the PI generation (Pearce et al., 2018; Tagasovska & Lopez-Paz, 2019) . While effective, both approaches have shortcomings. On the one hand, the former group is not optimized for PIs generation, having to convert a set of sampled values into a distribution. This lack of PI optimization makes using these approaches difficult in domains such as financial risk mitigation or scheduling. For example, providing a PI for the number of days a machine can function without malfunctioning (e.g., 30-45 days with 99% certainty) is more valuable than a prediction for the specific time of failure. On the other hand, the latter group-PI-dedicated architectures -provides accurate upper and lower bounds for the prediction, but lacks in the accuracy of its value predictions. Consequently, these approaches select the middle of the interval as their value prediction, which is (as we later demonstrate) a sub-optimal strategy that makes assumptions regarding the value distribution within the interval. The shortcomings of PI-dedicated architectures with regard to value prediction are supported both by Pearce et al. (2018) and by our experiments in Section 5. We propose PIVEN (prediction intervals with specific value prediction), a novel approach for simultaneous PI generation and value prediction using DNNs. Our approach combines the benefits of both above-mentioned groups by producing both a PI and a value prediction, while ensuring that the latter is within the former. We follow the experimental procedure of recent works, and compare our approach to current best-performing methods: Quality-Driven PI (QD) by Pearce et al. ( 2018) (a dedicated PI generation method), and Deep Ensembles (DE) by DeepMind (Lakshminarayanan et al., 2017) . Our results show that PIVEN outperforms QD by producing narrower PIs, while simultaneously achieving comparable results to DE in terms of value prediction. Additional analysis on large image datasets and synthetic data shows that PIVEN performs well on skewed value distributions and can be effectively combined with pre-trained DNN architectures.

2. RELATED WORK

Modeling uncertainty in deep learning has been an active area of research in recent years (Pearce et al., 2018; Qiu et al., 2019; Gal & Ghahramani, 2016; Lakshminarayanan et al., 2017; Keren et al., 2018; Kendall & Gal, 2017; Geifman et al., 2018; Ovadia et al., 2019) . Studies in uncertainty modeling and regression can be generally divided into two groups: PI-based and non-PI-based. Non-PI approaches utilize both Bayesian (MacKay, 1992) and non-Bayesian approaches. The former methods define a prior distribution on the weights and biases of a neural net (NN), while inferring a posterior distribution from the training data. Non-Bayesian methods (Gal & Ghahramani, 2016; Lakshminarayanan et al., 2017; Qiu et al., 2019) do not use initial prior distributions. In (Gal & Ghahramani, 2016) , Monte Carlo sampling was used to estimate the predictive uncertainty of NNs through the use of dropout over multiple runs. A later study (Lakshminarayanan et al., 2017) employed a combination of ensemble learning and adversarial training to quantify model uncertainty. In an expansion of a previously-proposed approach (Nix & Weigend, 1994) , each NN was optimized to learn the mean and variance of the data, assuming a Gaussian distribution. Recently, Qiu et al. ( 2019) proposed a post-hoc procedure using Gaussian processes to measure uncertainty. PI-based approaches are designed to produce a PI for each sample. Keren et al. (2018) propose a post-processing approach that considers the regression problem as one of classification, and uses the output of the final softmax layer to produce PIs. Tagasovska & Lopez-Paz (2019) propose the use of a loss function designed to learn all conditional quantiles of a given target variable. LUBE (Khosravi et al., 2010) consists of a loss function optimized for the creation of PIs, but has the caveat of not being able to use stochastic gradient descent (SGD) for its optimization. A recent study (Pearce et al., 2018) inspired by LUBE, proposed a loss function that is both optimized for the generation of PIs and can be optimized using SGD. Each of the two groups presented above tends to under-perform when applied to tasks for which its loss function was not optimized: Non-PI approaches produce more accurate value predictions, but are not optimized to produce PI and therefore produce bounds that are less tight. PI-based methods produce tight bounds, but tend to underperform when producing value predictions. . Recent studies (Kivaranovic et al., 2019; Romano et al., 2019) attempted to produce both value predictions and PIs by using conformal prediction with quantile regression. While effective, these methods use a complex splitting strategy, where one part of the data is used to produce value predictions and PIs, while the the other part is to further adjust the PIs. Recently, Salem et al. (2020) proposed a method for combining the two, together with post-hoc optimization. Contrary to these approaches, PIVEN produces PIs with value predictions in an end-to-end manner by relying on a novel loss function.

3. PROBLEM FORMULATION

In this work we consider a neural network regressor that processes an input x ∈ X with an associated label y ∈ R, where X can be any feature space (e.g., tabular data, age prediction from images). Let (x i , y i ) ∈ X × R be a data point along with its target value. Let U i and L i be the upper and lower bounds of PIs corresponding to the ith sample. Our goal is to construct (L i , U i , y i ) so that Pr(L i ≤ y i ≤ U i ) ≥ 1 -α. We refer to 1 -α as the confidence level of the PI. We define two quantitative measures for the evaluation of PIs, as defined in Khosravi et al. (2010) . First, we define coverage as the ratio of dataset samples that fall within their respective PIs. We measure coverage using the prediction interval coverage probability (PICP) metric: P ICP := 1 n n i=1 k i (1) where n denotes the number of samples and k i = 1 if y i ∈ (L i , U i ), otherwise k i = 0. Next, we define a quality metric for the generated PIs with the goal of producing as tight a bound as possible

