ACCURATE BAYESIAN META-LEARNING BY ACCURATE TASK POSTERIOR INFERENCE

Abstract

Bayesian meta-learning (BML) enables fitting expressive generative models to small datasets by incorporating inductive priors learned from a set of related tasks. The Neural Process (NP) is a prominent deep neural network-based BML architecture, which has shown remarkable results in recent years. In its standard formulation, the NP encodes epistemic uncertainty in an amortized, factorized, Gaussian variational (VI) approximation to the BML task posterior (TP), using reparametrized gradients. Prior work studies a range of architectural modifications to boost performance, such as attentive computation paths or improved context aggregation schemes, while the influence of the VI scheme remains under-explored. We aim to bridge this gap by introducing GMM-NP, a novel BML model, which builds on recent work that enables highly accurate, full-covariance Gaussian mixture (GMM) TP approximations by combining VI with natural gradients and trust regions. We show that GMM-NP yields tighter evidence lower bounds, which increases the efficiency of marginal likelihood optimization, leading to improved epistemic uncertainty estimation and accuracy. GMM-NP does not require complex architectural modifications, resulting in a powerful, yet conceptually simple BML model, which outperforms the state of the art on a range of challenging experiments, highlighting its applicability to settings where data is scarce.

1. INTRODUCTION

Driven by algorithmic advances in the field of deep learning (DL) and the availability of increasingly powerful GPU-assisted hardware, the field of machine learning achieved a plethora of impressive results in recent years (Parmar et al., 2018; Radford et al., 2019; Mnih et al., 2015) . These were enabled to a large extent by the availability of huge datasets, which enables training expressive deep neural network (DNN) models. In practice, e.g., in industrial settings, such datasets are unfortunately rarely available, rendering standard DL approaches futile. Nevertheless, it is often the case that similar tasks arise repeatedly, such that the number of context examples on a novel target task is typically relatively small, but the joint meta-dataset of examples from all tasks accumulated over time can be massive, s.t. powerful inductive biases can be extracted using meta-learning (Hospedales et al., 2022) . While these inductive biases allow restricting predictions to only those compatible with the meta-data, there typically remains epistemic uncertainty due to task ambiguity, as the context data is often not informative enough to identify the target task exactly. Bayesian meta-learning (BML) aims at an accurate quantification of this uncertainty, which is crucial for applications like active learning, Bayesian optimization (Shahriari et al., 2016) , model-based reinforcement learning (Chua et al., 2018 ), robotics (Deisenroth et al., 2011) , and in safety-critical scenarios.

