TASK CALIBRATION FOR DISTRIBUTIONAL UNCERTAINTY IN FEW-SHOT CLASSIFICATION

Abstract

As numerous meta-learning algorithms improve performance when solving fewshot classification problems for practical applications, accurate prediction of uncertainty, though challenging, has been considered essential. In this study, we contemplate modeling uncertainty in a few-shot classification framework and propose a straightforward method that appropriately predicts task uncertainty. We suppose that the random sampling of tasks can generate those in which it may be hard for the model to infer the queries from the support examples. Specifically, measuring the distributional mismatch between support and query sets via class-wise similarities, we propose novel meta-training that lets the model predict with careful confidence. Moreover, our method is algorithm-agnostic and readily expanded to include a range of meta-learning models. Through extensive experiments including dataset shift, we present that our training strategy helps the model avoid being indiscriminately confident, and thereby, produce calibrated classification results without the loss of accuracy.

1. INTRODUCTION

With the great success of deep learning over the last decade, there has been growing interest in investigating methods that are more intuitive to mimic human intelligence. One of the desirable characteristics of human cognition is the ability to learn new information quickly. Few-shot learning is a problem that requires machines to learn with a few examples, and it is usually solved by metalearning algorithms. Through training over a number of few-shot tasks, the meta-learning model learns to acclimate rapidly to new data and perform desired tasks with the help of prior across-task knowledge. Noteworthy approaches include learning metric space (Koch et al., 2015; Snell et al., 2017; Sung et al., 2018) , learning update rule (Ravi & Larochelle, 2017; Andrychowicz et al., 2016) , or learning an initialization (Maclaurin et al., 2015; Finn et al., 2017) . Meanwhile, calibration is critical in real-life because the model should correctly inform humans or other models of its degree of uncertainty. Misplaced confidence or overconfidence of a deep network can result in dramatically different outcomes in the decision-making process, such as during autonomous driving (Helldin et al., 2013) or medical diagnoses (Cabitza et al., 2017) . Calibration is even more crucial in few-shot learning, for given a few numbers of data at hand, the model is likely to put wrong confidence into the predictions of the unknown data. Also, existing calibration methods, while effective in general classification, are not readily applicable to the few-shot learning. Therefore, we present a novel method to measure task-level uncertainty and exploit it to make a well-calibrated model. When the model is meta-trained to solve the few-shot classification, it generates a task by randomly choosing classes and sampling support and query examples of corresponding classes. However, we suppose in this way a variety of tasks can be generated, and there are some tasks in which it may be hard for the model to infer the queries from the support examples. Common meta-learning approaches force the model to learn anyway, lacking the discussion about task generation. In this study, we design an algorithm-agnostic calibration method for a few-shot classification model by diminishing the learning signal from those ill-defined tasks. This novel training lets the model predict with careful confidence, ultimately obtaining better calibration ability. To summarize, our contributions are as follows: While Task 1 comprises well-distributed D spt and D qry , Task 2 holds a significant support-query discrepancy. We limit the feedback flow from Task 2. • We propose a simple and straightforward method by which a few-shot classification model learns the ill-defined tasks in a timid way, so as not to place the wrong confidence. • We extend our approach from an optimization-based to a metric-based framework, confirming that our method is versatile and can be easily expanded to include a range of metalearning models. • We demonstrate that we can design an uncertainty-aware model that is suitable for calibration and robustness. To the best of our knowledge, there has been a work that used weighted meta-update in reinforcement learning literature (Xu et al., 2019) to prioritize and give different weights for each trajectory. However, our work is the first one that quantified task uncertainty and applied it in the few-shot classification framework. Moreover, we validate our work through a dataset shift experiment, which is set to be more challenging and hardly ever addressed due to the degradation of the estimation ability (Snoek et al., 2019) .

2. PRELIMINARIES 2.1 PROBLEM SETUP

A meta-learning problem is composed of tasks from a task distribution p(τ ), and for each metatraining task τ ∼ p(τ ) there is a training set and test set. We denote each as a support set, D  (τ ) s , y (τ ) s )} kN s=1 := D (τ ) spt . This is called N -way k-shot problem. Also, there is a disjoint query set, {(x (τ ) q , y (τ ) q )} k N q=1 := D (τ ) qry . Meta-training is an iterative process of optimization on D qry after being seen D spt . This setup resembles learning a novel task with only a few examples of training data. The adaptation on D spt corresponds to training with the novel task data. Given a few examples, meta-learner is expected to perform predictions on unseen data, which in our setup is D qry . That is, the meta-learner evolves by acquiring general information across the tasks by optimizing on D qry .

2.2. UNCERTAINTY IN FEW-SHOT CLASSIFICATION

A good meta learner should not only perform well on given tasks but should also make reasonable decisions based on aleatoric or epistemic uncertainty (Der Kiureghian & Ditlevsen, 2009) . While aleatory is irreducible uncertainty from data, including class overlap, observation noise, or data ambiguity, episteme arises from the stochasticity of the model itself, and can be reduced by obtaining sufficient data. For example, aleatory is the case when 'dog' images have inherently high noise (e.g., blurred and corrupted), and thus, have the appearance of other class such as 'cat' (known-unknown). Episteme arises when the model is not sufficiently trained for it to match the images to 'dog'. There is another source, distributional uncertainty, which is ascribed to a mismatch between training and test data (Malinin & Gales, 2018) . In few-shot classification, this corresponds to the discrepancy between support and query set, often caused by intra-class variations (Chen et al., 2019) . For example, distributional uncertainty may arise when the 'dog' class includes support examples as 'bulldogs' while by chance, the query is 'chihuahua', resulting in large uncertainty of mapping 'chihuahua' to any class (unknown-unknown). Typical meta-learning algorithms do not consider this mismatch, which we call support-query discrepancy, and naïvely adapt to every observed support



Figure 1: After adaptation to the support examples (blue), inference is made from the query examples (red), which guides the following metaupdate.While Task 1 comprises well-distributed D spt and D qry , Task 2 holds a significant support-query discrepancy. We limit the feedback flow from Task 2.

. In typical few-shot classification scenarios, each task is comprised of data from N randomly selected classes and k support examples per class, with a total of kN examples, {(x

