REAL-TIME UNCERTAINTY DECOMPOSITION FOR ONLINE LEARNING CONTROL Anonymous authors Paper under double-blind review

Abstract

Safety-critical decisions based on machine learning models require a clear understanding of the involved uncertainties to avoid hazardous or risky situations. While aleatoric uncertainty can be explicitly modeled given a parametric description, epistemic uncertainty rather describes the presence or absence of training data. This paper proposes a novel generic method for modeling epistemic uncertainty and shows its advantages over existing approaches for neural networks on various data sets. It can be directly combined with aleatoric uncertainty estimates and allows for prediction in real-time as the inference is sample-free. We exploit this property in a model-based quadcopter control setting and demonstrate how the controller benefits from a differentiation between aleatoric and epistemic uncertainty in online learning of thermal disturbances.

1. INTRODUCTION

With improved sensor quality and more powerful computational resources, data-driven models are increasingly applied in safety-critical domains such as autonomous driving or human-robot interaction (Grigorescu et al., 2020) . However, measurements usually suffer from noise and the available data is often scarce compared to all possible states of a complex environment. This requires controllers, which rely on supervised learning techniques, to properly react to ignorance and imprecision in the model to avoid dangerous situations. In order to allow an implementation of risk-averse (for exploitation and safety improvements) or risk-seeking (for exploration) behavior, the model must clearly disaggregate the information in the data into more than just the "best estimate" and differentiate between different sources of uncertainty. Besides the point estimate of a model, one can distinguish aleatoric (uncertainty in the data) and epistemic (uncertainty in the model) uncertainty. The former is irreducible as it is inherent to the stochastic process the data is recorded from, while the latter origins from a limited expressive power of the model or scarce training samples (Der Kiureghian & Ditlevsen, 2009) . Gaussian processes (GPs) inherently provide a measure for its fidelity with the posterior standard deviation prediction (Rasmussen & Williams, 2006) . It also allows to differentiate aleatoric uncertainty (typically considered as observation noise) and epistemic uncertainty (modeled by the kernel). However, the former allows only homoscedastic (constant) estimates, while real-world applications typically require heteroscedastic uncertainty models. An extension to heteroscedastic GP regression is presented in (Lazaro-Gredilla & Titsias, 2011), however, it is a variational approximation and further increases the computational complexity and GPs generally suffer from poor scaling to large datasets (Quinonero-Candela & Rasmussen, 2005) . In deep learning, the modeling of uncertainties also gained increasing interest over the past years (Kendall & Gal, 2017) . Heteroscedastic aleatoric uncertainty can be captured well, if the output of the stochastic process can directly be observed and its parametric distribution is known. However, for more general cases, approximation techniques such as variational inference or sampling is required (Bishop, 2006) . For epistemic uncertainty estimation with neural networks (NN), the key idea for most approaches can be summarized as follows. Randomness is introduced to the neural network through sampling during training and inference. While the training robustifies the network against the injected noise at the training locations, it allows the noise to pass to the output at input locations where no training data is available. For inference, multiple predictions of the network are sampled for the same inputs, allowing to compute a statistical measure for the uncertainty at the output (Depeweg et al., 2018; Depeweg, 2019) . However, sampling the network during inference is a high computational burden, and is therefore not suitable in real-time critical control tasks. An ensemble based approach by (Lakshminarayanan et al., 2017) works with far less instances of a network, but does not differentiate between aleatoric and epistemic uncertainty explicitly. Despite those drawbacks in the uncertainty representation of data-driven models, the control community started to incorporate them increasingly in the decision making for various applications. 2013); Todorov & Li (2005) . However, all of these approaches only consider the model fidelity in general and do not differentiate between aleatoric and epistemic uncertainty. Therefore, the main contributions of this paper are the following. We propose a deep learning framework with a real-time capable epistemic uncertainty prediction. The resulting online learning model is employed by a controller, which shows a distinct reaction to epistemic and aleatoric uncertainty. We evaluate the proposed methods on synthetic and real-world benchmark data sets, and simulate a quadcopter controller, which learns online the disturbances injected by thermals.

2. PROBLEM FORMULATION

Consider the discrete-time dynamical systemfoot_0 with control u ∈ U ⊆ R du and state x ∈ X ⊆ R dx x k+1 = g(x k , u k ) + y k , where g : X × U → X is known, while y is a i.i.d. random vector sampled in every time step from y k ∼ D(f (x k )), where D(•) denotes a known distribution over real vectors y ∈ Y ⊆ R dx and depends on the parameters p ∈ P ⊆ R dp . These state-dependent parameters arise from an unknown mapping f : X → P. We denote the unknown component y k of the dynamical system generally as disturbance but it could also be the unmodeled part of the dynamics, such as friction or serve as black-box model for the dynamics if no analytic description is available (g(•, •) = 0). We assume measurements can be taken to obtain the data set D tr = {(x i , y i )} Ntr i=1 with inputs X tr = {x i } Ntr i=1 and outputs Y tr = {y i } Ntr i=1 , such that a model f (•) of f (•) can be learned. N tr ∈ N denotes the current number of training data points and is initially zero, i.e., the training set is empty. The task is to choose a control input u k , such that the system (1) follows a given reference x des . Furthermore, the controller can take new measurements of y to improve its model over time. We consider each measurement of y to be costly and therefore new training points should only be collected when necessary. Applications, where data collection is costly can be found in distributed systems, where multiple sensors share the same scarce communication channel, or in autonomous systems with limited data storage capacity. The need for high data efficiency requires models, which judge upon their own fidelity in real-time to identify valuable measurements. As existing approaches for modeling epistemic uncertainty in deep learning suffer from a high computational complexity we first focus on developing a novel method for epistemic uncertainty predictions before proposing an online learning control strategy which makes use of a neural network model decomposing its uncertainties.



Bold/capital symbols generally denote vectors/matrices, D(•)/U(•)/N (•)/B(•) a general parametric/the uniform/Gaussian/Bernoulli distribution, respectively.



For example Fanger et al. (2016) uses an epistemic uncertainty measure to dynamically assign leader order follower roles for cooperative robotic manipulation. The work by Berkenkamp et al. (2016) ensures a safe exploration of an unknown task space based on GP error bounds and a gain scheduling approach for computed torque control is presented in Beckers et al. (2019). The work by Liu et al. (2020) considers the epistemic uncertainty as an estimate of the distance between source and target domains (known as domain shift) to design a robust controller. In Umlauft & Hirche (2020) and Chowdhary et al. (2015) an online learning control approach for GPs models is considered, which approach the dual control problem (Wittenmark, 1995) as a model-based adaptive control problem. The work by Yesildirak & Lewis (1995) uses neural network for adaptive control in a continuous time fashion, which relies on a time-triggered (periodic) update of the model rather than a eventbased adaptation as we propose in this work. More general, risk averse control strategies have been presented by Umlauft et al. (2018); Medina et al. (

