EXCESS RISK ANALYSIS FOR EPISTEMIC UNCERTAINTY WITH APPLICATION TO VARIATIONAL INFERENCE

Abstract

Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis exists on EU, although many numerical experiments have been conducted on it. In this study, we analyze the EU of supervised learning in approximate Bayesian inference by focusing on its excess risk. First, we theoretically show the novel relations between generalization error and the widely used EU measurements, such as the variance and mutual information of predictive distribution, and derive their convergence behaviors. Next, we clarify how the objective function of VI regularizes the EU. With this analysis, we propose a new objective function for VI that directly controls the prediction performance and the EU based on the PAC-Bayesian theory. Numerical experiments show that our algorithm significantly improves the EU evaluation over the existing VI methods.

1. INTRODUCTION

As machine learning applications spread, understanding the uncertainty of predictions is becoming more important to increase our confidence in machine learning algorithms (Bhatt et al., 2021) . Uncertainty refers to the variability of a prediction caused by missing information. For example, in regression problems, it corresponds to the error bars in predictions; and in classification problems, it is often expressed as the class posterior probability, entropy, and mutual information (Hüllermeier & Waegeman, 2021; Gawlikowski et al., 2022) . There are two types of uncertainty (Bhatt et al., 2021) : 1) Aleatoric uncertainty (AU), which is caused by noise in the data itself, and 2) Epistemic uncertainty (EU), which is caused by a lack of training data. In particular, since EU can tell us where in the input space is yet to be learned, integrated with deep learning methods, it is used in such applications as dataset shift (Ovadia et al., 2019) Mathematically, AU is defined as Bayes risk, which expresses the fundamental difficulty of learning problems (Depeweg et al., 2018; Jain et al., 2021; Xu, 2020) . For EU, Bayesian inference is useful because posterior distribution updated from prior distribution can represent a lack of data (Hüllermeier & Waegeman, 2021) . In practice, measurements like the variance of the posterior predictive distribution, and associated conditional mutual information represented EU in practice (Kendall & Gal, 2017; Depeweg et al., 2018) . In Bayesian inference, since posterior distribution is characterized by the training data and the model using Bayes' formula, its prediction performance and EU are determined automatically. However, due to computational issues, such exact Bayesian inference is difficult to implement; we often use approximation methods, such as variational inference (VI) (Bishop, 2006) , especially for deep Bayesian models. Since the derived posterior distribution also depends on the properties of approximation methods, the prediction performance and EU of deep Bayesian learning are no longer automatically guaranteed through Bayes' formula. The prediction performance has been analyzed as generalization error, for example, by PAC-Bayesian theory (Alquier, 2021) . Since EU is also essential



, adversarial data detection (Ye & Zhu, 2018), active learning (Houlsby et al., 2011), Bayesian optimization (Hernández-Lobato et al., 2014), and reinforcement learning (Janz et al., 2019).

