ACCURATE BAYESIAN META-LEARNING BY ACCURATE TASK POSTERIOR INFERENCE

Abstract

Bayesian meta-learning (BML) enables fitting expressive generative models to small datasets by incorporating inductive priors learned from a set of related tasks. The Neural Process (NP) is a prominent deep neural network-based BML architecture, which has shown remarkable results in recent years. In its standard formulation, the NP encodes epistemic uncertainty in an amortized, factorized, Gaussian variational (VI) approximation to the BML task posterior (TP), using reparametrized gradients. Prior work studies a range of architectural modifications to boost performance, such as attentive computation paths or improved context aggregation schemes, while the influence of the VI scheme remains under-explored. We aim to bridge this gap by introducing GMM-NP, a novel BML model, which builds on recent work that enables highly accurate, full-covariance Gaussian mixture (GMM) TP approximations by combining VI with natural gradients and trust regions. We show that GMM-NP yields tighter evidence lower bounds, which increases the efficiency of marginal likelihood optimization, leading to improved epistemic uncertainty estimation and accuracy. GMM-NP does not require complex architectural modifications, resulting in a powerful, yet conceptually simple BML model, which outperforms the state of the art on a range of challenging experiments, highlighting its applicability to settings where data is scarce.

1. INTRODUCTION

Driven by algorithmic advances in the field of deep learning (DL) and the availability of increasingly powerful GPU-assisted hardware, the field of machine learning achieved a plethora of impressive results in recent years (Parmar et al., 2018; Radford et al., 2019; Mnih et al., 2015) . These were enabled to a large extent by the availability of huge datasets, which enables training expressive deep neural network (DNN) models. In practice, e.g., in industrial settings, such datasets are unfortunately rarely available, rendering standard DL approaches futile. Nevertheless, it is often the case that similar tasks arise repeatedly, such that the number of context examples on a novel target task is typically relatively small, but the joint meta-dataset of examples from all tasks accumulated over time can be massive, s.t. powerful inductive biases can be extracted using meta-learning (Hospedales et al., 2022) . While these inductive biases allow restricting predictions to only those compatible with the meta-data, there typically remains epistemic uncertainty due to task ambiguity, as the context data is often not informative enough to identify the target task exactly. Bayesian meta-learning (BML) aims at an accurate quantification of this uncertainty, which is crucial for applications like active learning, Bayesian optimization (Shahriari et al., 2016) , model-based reinforcement learning (Chua et al., 2018 ), robotics (Deisenroth et al., 2011) , and in safety-critical scenarios. Building on these insights and on recent advances in VI (Lin et al., 2020; Arenz et al., 2022) , we propose GMM-NP, a novel NP-based BML algorithm that employs (i) a full-covariance Gaussian mixture (GMM) TP approximation, optimized in a (ii) non-amortized fashion, using (iii) robust and efficient trust region natural gradient (TRNG)-VI. We demonstrate through extensive empirical evaluations and ablations that our approach yields tighter evidence lower bounds, more efficient model optimization, and, thus, markedly improved predictive performance, outperforming the stateof-the-art both in terms of epistemic uncertainty quantification and accuracy. Notably, GMM-NP does not require complex architectural modifications, which shows that accurate TP inference is crucial for accurate BML, an insight we believe will be valuable for future research.

2. RELATED WORK

Multi-task learning aims to leverage inductive biases learned on a meta-dataset of similar tasks for improved data efficiency on unseen target tasks of similar structure. Notable variants include transfer-learning (Zhuang et al., 2020) , that refines and combines pre-trained models (Golovin et al., 2017; Krizhevsky et al., 2012), and meta-learning (Schmidhuber, 1987; Thrun & Pratt, 1998; Vilalta & Drissi, 2005; Hospedales et al., 2022) , which makes the multi-task setting explicit in the model design by formulating fast adapation mechanisms in order to learn how to solve tasks with little context data ("few-shot learning"). A plethora of architectures were studied in the literature, including learner networks that adapt model parameters (Bengio et al., 1991; Schmidhuber, 1992; Ravi & Larochelle, 2017 ), memory-augmented DNNs (Santoro et al., 2016) , early instances of Bayesian meta-models (Edwards & Storkey, 2017; Hewitt et al., 2018) , and algorithms that that make use of learned measures of task similarity (Koch et al., 2015; Vinyals et al., 2016; Snell et al., 2017) . Arguably the most prominent meta-learning approaches are the Model-agnostic Meta-learning (MAML) and the Neural Process (NP) model families, due to their generality and flexibility. While the original MAML (Finn et al., 2017) and Conditional NP (Garnelo et al., 2018a) formulations do



Figure1: Visualization of our GMM-NP model for a d z = 2 dimensional latent space, trained on a meta-dataset of sinusoidal functions with varying amplitudes and phases, after having observed a single context example (red cross, right panel) from an unseen task (black dots, right panel). Left panel: unnormalized task posterior (TP) distribution (contours) and GMM TP approximation with K = 3 components (ellipses, mixture weights in %). Right panel: corresponding function samples from our model (blue lines). A single context example leaves much task ambiguity, reflected in a highly correlated, multi-modal TP. Our GMM approximation correctly captures this: predictions are in accordance with (i) the observed data (all samples pass close to the red context example), and with (ii) the learned inductive biases (all samples are sinusoidal), cf. alsoFig. 12 in App. A.5.5

). Left panel: unnormalized task posterior (TP) distribution (contours) and GMM TP approximation with K = 3 components (ellipses, mixture weights in %). Right panel: corresponding function samples from our model (blue lines). A single context example leaves much task ambiguity, reflected in a highly correlated, multi-modal TP. Our GMM approximation correctly captures this: predictions are in accordance with (i) the observed data (all samples pass close to the red context example), and with (ii) the learned inductive biases (all samples are sinusoidal), cf. also Fig.12in App. A.5.5A prominent BML approach is the Neural Process (NP)(Garnelo et al., 2018b)  which employs a DNN-based conditional latent variable (CLV) model, in which the Bayesian belief about the target task is encoded in a factorized Gaussian task posterior (TP) approximation, and inference is amortized over tasks using set encoders(Zaheer et al., 2017). This architecture can be optimized effi-

