BAYESIAN NEURAL NETWORKS WITH VARIANCE PROPAGATION FOR UNCERTAINTY EVALUATION

Abstract

Uncertainty evaluation is a core technique when deep neural networks (DNNs) are used in real-world problems. In practical applications, we often encounter unexpected samples that have not seen in the training process. Not only achieving the high-prediction accuracy but also detecting uncertain data is significant for safetycritical systems. In statistics and machine learning, Bayesian inference has been exploited for uncertainty evaluation. The Bayesian neural networks (BNNs) have recently attracted considerable attention in this context, as the DNN trained using dropout is interpreted as a Bayesian method. Based on this interpretation, several methods to calculate the Bayes predictive distribution for DNNs have been developed. Though the Monte-Carlo method called MC dropout is a popular method for uncertainty evaluation, it requires a number of repeated feed-forward calculations of DNNs with randomly sampled weight parameters. To overcome the computational issue, we propose a sampling-free method to evaluate uncertainty. Our method converts a neural network trained using dropout to the corresponding Bayesian neural network with variance propagation. Our method is available not only to feed-forward NNs but also to recurrent NNs including LSTM. We report the computational efficiency and statistical reliability of our method in numerical experiments of language modeling using RNNs, and the out-of-distribution detection with DNNs.

1. INTRODUCTION

Uncertainty evaluation is a core technique in practical applications of deep neural networks (DNNs). As an example, let us consider the Cyber-Physical Systems (CPS) such as the automated driving system. In the past decade, machine learning methods are widely utilized to realize the environment perception and path-planing components in the CPS. In particular, the automated driving system has drawn a huge attention as a safety-critical and real-time CPS (NITRD CPS Senior Steering Group, 2012; Wing, 2009) . In the automated driving system, the environment perception component is built using DNN-based predictive models. In real-world applications, the CPS is required to deal with unexpected samples that have not seen in the training process. Therefore, not only achieving the high-prediction accuracy under the ideal environment but providing uncertainty evaluation for real-world data is significant for safety-critical systems (Henne et al., 2019) . The CPS should prepare some options such as the rejection of the recommended action to promote the user's intervention when the uncertainty is high. Such an interactive system is necessary to build fail-safe systems (Varshney & Alemzadeh, 2017; Varshney, 2016) . On the other hand, the uncertainty evaluation is useful to enhance the efficiency of learning algorithms, i.e., samples with high uncertainty are thought to convey important information for training networks. Active data selection based on the uncertainty has been studied for long time under the name of active learning (David et al., 1996; Gal et al., 2017; Holub et al., 2008; Li & Guo, 2013; Shui et al., 2020) . In statistics and machine learning, Bayesian estimation has been commonly exploited for uncertainty evaluation (Bishop, 2006.) . In the Bayesian framework, the prior knowledge is represented as the prior distribution of the statistical model. The prior distribution is updated to the posterior distribution based on observations. The epistemic model uncertainty is represented in the prior distribution, and upon observing data, those beliefs can be updated in the form of a posterior distribution, which yields model uncertainty conditioned on observed data. The entropy or the variance is representative of uncertainty measures (Cover & Thomas, 2006) . For complicated models such as DNNs, however, a direct application of Bayesian methods is prohibited as the computation including the high-dimensional integration highly costs. In deep learning, Bayesian methods are related to stochastic learning algorithms. This relation is utilized to approximate the posterior over complex models. The stochastic method called dropout is a powerful regularization method for DNNs (Srivastava et al., 2014) . In each layer of the DNN, some units are randomly dropped in the learning using stochastic gradient descent methods. Gal & Ghahramani (2016a) revealed that the dropout is interpreted as the variational Bayes method. Based on this interpretation, they proposed a simple sampling method of DNN parameters from the approximate posterior distribution. Furthermore, the uncertainty of the DNN-based prediction is evaluated using the Monte-Carlo (MC) method called MC dropout. While the Bayesian DNN trained using dropout is realized by a simple procedure, the computational overhead is not ignorable. In the MC dropout, dropout is used also at the test time with a number of repeated feed-forward calculations to effectively sample from the approximate posterior. Hence, the naive MC dropout is not necessarily relevant to the system demanding the real-time response. In this work, we propose a sampling-free method to evaluate the uncertainty of the DNN-based prediction. Our method is computationally inexpensive comparing to the MC dropout and provides reliable uncertainty evaluation. In the following, we will first outline related works. Section 3 is devoted to show the detailed formulae of calculating the uncertainty. In our method, an upper bound of the variance is propagated in each layer to evaluate the uncertainty of the output. We show that the our method alleviates the overconfident prediction. This property is shared with scaling methods for the calibration of the class-probability on test samples. In Section 4, we study the relation between our method and scaling methods. In Section 5, we demonstrate the computational efficiency and statistical reliability of our method through some numerical experiments using both DNNs and RNNs.

2. RELATED WORKS

The framework of Bayesian inference is often utilized to evaluate the uncertainty of DNN-based predictions. In Bayesian methods, the uncertainty is represented by the predictive distribution defined from the posterior distribution of the weight parameters. MacKay (1992) proposed a simple approximation method of the posterior distribution for neural networks, and demonstrated that the Bayesian method improves the prediction performance on classification tasks. Graves (2011) showed that the variational method efficiently works to approximate the posterior distribution of complex neural network models. There are many approaches to evaluate the uncertainty of modern DNNs (Alex Kendall & Cipolla, 2017; Choi et al., 2018; Lu et al., 2017; Le et al., 2018) . We briefly review MC-based methods and sampling-free methods. Monte-Carlo methods based on Stochastic Learning: The randomness in the learning process can be interpreted as a prior distribution. In particular, the dropout is a landmark of stochastic regularization method to train DNNs (Srivastava et al., 2014) . Gal & Ghahramani (2016a) proposed a simple method to generate weight parameters from the posterior distribution induced from the prior corresponding to the dropout regularization. The predictive distribution is approximated by the MC dropout, which compute the expected output over the Monte-Carlo sampling of the weight parameters. Gal & Ghahramani (2016b) reported that the MC dropout efficiently works not only for feed-forward DNNs but for recurrent neural networks (RNNs). Another sampling based method is the ensemble-based posteriors with different random seeds (Lakshminarayanan et al., 2017) . However, the computation cost is high as the bootstrap method requires repeated training of parameters using resampling data. Sampling-free methods: Though the MC dropout is a simple and practical method to evaluate the uncertainty, a number of feed-forward computations are necessary to approximate the predictive distribution. Recently, some sampling-free methods have been proposed for the uncertainty

