UNCERTAINTY-AWARE META-LEARNING FOR MULTI-MODAL TASK DISTRIBUTIONS

Abstract

Meta-learning or learning to learn is a popular approach for learning new tasks with limited data (i.e., few-shot learning) by leveraging the commonalities among different tasks. However, meta-learned models can perform poorly when context data is limited, or when data is drawn from an out-of-distribution (OoD) task. Especially in safety-critical settings, this necessitates an uncertainty-aware approach to meta-learning. In addition, the often multimodal nature of task distributions can pose unique challenges to meta-learning methods. In this work, we present UNLIMITD (uncertainty-aware meta-learning for multimodal task distributions), a novel method for meta-learning that (1) makes probabilistic predictions on in-distribution tasks efficiently, (2) is capable of detecting OoD context data at test time, and (3) performs on heterogeneous, multimodal task distributions. To achieve this goal, we take a probabilistic perspective and train a parametric, tuneable distribution over tasks on the meta-dataset. We construct this distribution by performing Bayesian inference on a linearized neural network, leveraging Gaussian process theory. We demonstrate that UNLIMITD's predictions compare favorably to, and outperform in most cases, the standard baselines, especially in the low-data regime. Furthermore, we show that UNLIMITD is effective in detecting data from OoD tasks. Finally, we confirm that both of these findings continue to hold in the multimodal task-distribution setting.

1. INTRODUCTION

Learning to learn is essential in human intelligence but is still a wide area of research in machine learning. Meta-learning has emerged as a popular approach to enable models to perform well on new tasks using limited data. It involves first a meta-training process, when the model learns valuable features from a set of tasks. Then, at test time, using only few datapoints from a new, unseen task, the model (1) adapts to this new task (i.e., performs few-shot learning with context data), and then (2) infers by making predictions on new, unseen query inputs from the same task. A popular baseline for meta-learning, which has attracted a large amount of attention, is Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) , in which the adaptation process consists of fine-tuning the parameters of the model via gradient descent. However, meta-learning methods can often struggle in several ways when deployed in challenging real-world scenarios. First, when context data is too limited to fully identify the test-time task, accurate prediction can be challenging. As these predictions can be untrustworthy, this necessitates the development of meta-learning methods that can express uncertainty during adaptation (Yoon et al., 2018; Harrison et al., 2018) . In addition, meta-learning models may not successfully adapt to "unusual" tasks, i.e., when test-time context data is drawn from an out-of-distribution (OoD) task not well represented in the training dataset (Jeong & Kim, 2020; Iwata & Kumagai, 2022) . Finally, special care has to be taken when learning tasks that have a large degree of heterogeneity. An important example is the case of tasks with a multimodal distribution, i.e., when there are no common features shared across all the tasks, but the tasks can be broken down into subsets (modes) in a way that the ones from the same subset share common features (Vuorio et al., 2019) . Our contributions. We present UNLIMITD (uncertainty-aware meta-learning for multimodal task distributions), a novel meta-learning method that leverages probabilistic tools to address the aforementioned issues. Specifically, UNLIMITD models the true distribution of tasks with a learnable distribution constructed over a linearized neural network and uses analytic Bayesian inference to perform uncertainty-aware adaption. We present three variants (namely, UNLIMITD-I, UNLIM- ITD-R, and UNLIMITD-F) that reflect a trade-off between learning a rich prior distribution over the weights and maintaining the full expressivity of the network; we show that UNLIMITD-F strikes a balance between the two, making it the most appealing variant. Finally, we demonstrate that (1) our method allows for efficient probabilistic predictions on in-distribution tasks, that compare favorably to, and in most cases outperform, the existing baselines, (2) it is effective in detecting context data from OoD tasks at test time, and that (3) both these findings continue to hold in the multimodal task-distribution setting. The rest of the paper is organized as follows. Section 2 formalizes the problem. Section 3 presents background information on the linearization of neural networks and Bayesian linear regression. We detail our approach and its three variants in Section 4. We discuss related work in detail in Section 5. Finally, we present our experimental results concerning the performance of UNLIMITD in Section 6 and conclude in Section 7.

2. PROBLEM STATEMENT

A task T i consists of a function f i from which data is drawn. At test time, the prediction steps are broken down into (1) adaptation, that is identifying f i using K context datapoints (X i , Y i ) from the task, and (2) inference, that is making predictions for f i on the query inputs X i * . Later the predictions can be compared with the query ground-truths Y i * to estimate the quality of the prediction, for example in terms of mean squared error (MSE). The meta-training consists in learning valuable features from a cluster of tasks, which is a set of similar tasks (e.g., sines with different phases and amplitudes but same frequency), so that at test time the predictions can be accurate on tasks from the same cluster. We take a probabilistic, functional perspective and represent a cluster by p(f ), a theoretical distribution over the function space that describes the probability of a task belonging to the cluster. Learning p(f ) is appealing, as it allows for performing OoD detection in addition to making predictions. Adaptation amounts to computing the conditional distribution given test context data, and one can obtain an uncertainty metric by evaluating the negative log-likelihood (NLL) of the context data under p(f ). Thus, our goal is to construct a parametric, learnable functional distribution pξ (f ) that approaches the theoretical distribution p(f ), with a structure that allows tractable conditioning and likelihood computation, even in deep learning contexts. In practice, however, we are not given p(f ), but only a meta-training dataset D that we assume is sampled from p(f ): D = {( X i , Y i )} N i=1 , where N is the number of tasks available during training, and ( X i , Y i ) ∼ T i is the entire pool of data from which we can draw subsets of context data (X i , Y i ). Consequently, in the meta-training phase, we aim to optimize pξ (f ) to capture properties of p(f ), using only the samples in D. Once we have pξ (f ), we can evaluate it both in terms of how it performs for few-shot learning (by comparing the predictions with the ground truths in terms of MSE), as well as for OoD detection (by measuring how well the NLL of context data serves to classify in-distribution tasks against OoD tasks, measured via the AUC-ROC score).



Figure1: The true task distribution p(f ) can be multimodal, i.e., containing multiple clusters of tasks (e.g., lines and sines). Our approach UNLIMITD fits p(f ) with a parametric, tuneable distribution pξ (f ) yielded by Bayesian linear regression on a linearized neural network.

