SECOND-MOMENT LOSS: A NOVEL REGRESSION OBJECTIVE FOR IMPROVED UNCERTAINTIES

Abstract

Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. Despite its importance, it is far from being generally solved, especially for neural networks. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. However, it can underestimate the uncertainty. We propose a new objective, referred to as second-moment loss (SML), to address this issue. While the full network is encouraged to model the mean, the dropout networks are explicitly used to optimize the model variance. We analyze the performance of the new objective on various toy and UCI regression datasets. Comparing to the state-of-the-art of deep ensembles, SML leads to comparable prediction accuracies and uncertainty estimates while only requiring a single model. Under distribution shift, we observe moderate improvements. From a safety perspective also the study of worst-case uncertainties is crucial. In this regard we improve considerably. Finally, we show that SML can be successfully applied to SqueezeDet, a modern object detection network. We improve on its uncertainty-related scores while not deteriorating regression quality. As a side result, we introduce an intuitive Wasserstein distance-based uncertainty measure that is non-saturating and thus allows to resolve quality differences between any two uncertainty estimates.

1. INTRODUCTION

Having attracted great attention in both academia and digital economy, deep neural networks (DNNs, Goodfellow et al. (2016) ) are about to become vital components of safety-critical applications. Examples are autonomous driving (Pomerleau, 1989; Bojarski et al., 2016) or medical diagnostics (Liu et al., 2014) , where prediction errors potentially put humans at risk. These systems require methods that are robust not only under lab conditions (i.i.d. data sampling), but also under continuous domain shifts, think e.g. of adults on e-scooters or growing numbers of mobile health sensors. Besides shifts in the data, the data distribution itself poses further challenges. Critical situations are (fortunately) rare and thus strongly under-represented in datasets. Despite their rareness, these critical situations have a significant impact on the safety of operations. This calls for comprehensive self-assessment capabilities of DNNs and recent uncertainty mechanisms can be seen as a step in that direction. While a variety of uncertainty approaches has been established, stable quantification of uncertainty is still an open problem. Many recent machine learning applications are e.g. equipped with Monte Carlo (MC) dropout (Gal & Ghahramani, 2016) that offers conceptual simplicity and scalability. However, is tends to underestimate uncertainties thus bearing disadvantages compared to more recent approaches such as deep ensembles (Lakshminarayanan et al., 2017) . We propose an alternative uncertainty mechanism. It builds on dropout sub-networks and explicitly optimizes variances (see Fig. 1 for an illustrative example). Technically, this is realized by a simple additive loss term, the second-moment loss. To address the above outlined requirements for safety-critical systems, we evaluate our approach systematically w.r.t. continuous data shifts and worst-case performances. In detail, our contribution is as follows: • we introduce a novel regression loss for better calibrated uncertainties applicable to dropout networks, • we reach state-of-the-art performance in an empirical study and improve on it when considering data shift and worst-case performances, and • we demonstrate its applicability to real-world applications by example of 2D bounding box regression.

2. RELATED WORK

Approaches to estimate predictive uncertainties can be broadly categorized into three groups: Bayesian approximations, ensemble approaches and parametric models. 2019)). Further extensions of MC dropout target tuned performance by learning layer-specific drop rates using Concrete distributions (Gal et al., 2017) and the integration of aleatoric uncertainty (Kendall & Gal, 2017) . Note that dropout training is used-independent from an uncertainty context-for better model generalization (Srivastava et al., 2014) . Ensembles of neural networks, so-called deep ensembles (Lakshminarayanan et al., 2017) , pose another popular approach to uncertainty modelling. Comparative studies of uncertainty mechanisms (Snoek et al., 2019; Gustafsson et al., 2020) highlight their advantageous uncertainty quality, making deep ensembles a state-of-the-art method. Fort et al. ( 2019) argue that deep ensembles capture multimodality of loss landscapes thus yielding potentially more diverse sets of solutions. The third group are parametric modelling approaches that extend point estimations by adding a model output that is interpreted as variance or covariance (Nix & Weigend, 1994; Heskes, 1997) . Typically, these approaches optimize a (Gaussian) negative log-likelihood (NLL, Nix & Weigend (1994) ). A more recent representative of this group is, e.g., Kendall & Gal (2017) , for a review see Khosravi et al. (2011) . A closely related model class is deep kernel learning. It approaches uncertainty modelling by combining NNs and Gaussian processes (GPs) in various ways, e.g. via



Figure 1: Sampling-based uncertainty mechanisms on toy datasets. In contrast to MC dropout (left), the second-moment loss (right) induces uncertainties that capture aleatoric uncertainty. The ground truth data is shown in red. Each grey line represents the outputs of one of 200 random sub-networks that are obtained by applying dropout-based sampling to the trained full network. For details on the data sets ('toy-hf', 'toy-noise'), the neural architecture and the uncertainty methods please refer to section 4 and references therein.

