INTERVAL BOUND INTERPOLATION FOR FEW-SHOT LEARNING WITH FEW TASKS Anonymous

Abstract

Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks from the same task distribution, with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. This becomes more difficult when only a limited number of tasks are available for training. In such a few-task few-shot setting, it is beneficial to explicitly preserve the local neighborhoods from the task manifold and exploit this to generate artificial tasks for training. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains in comparison to recent methods.

1. INTRODUCTION

Few-shot learning problems deal with diverse tasks consisting of subsets of data drawn from the same underlying data manifold along with associated labels. The joint distribution of data and corresponding labels which governs the sampling of such tasks is often called the task distribution (Finn et al., 2017; Yao et al., 2022) . Consequently, few-shot learning methods attempt to leverage the knowledge acquired by training on a large pool of such tasks to easily generalize to unseen tasks from the same distribution, using only a few labeled examples. We hereafter refer to the support of the task distribution as the task manifold which is distinct from but closely-related to the data manifold associated with the data distribution. Since the unseen tasks are sampled from the same underlying manifold governing the task distribution, we should ideally learn a good representation of the task manifold by preserving the neighborhoods from the high-dimensional manifold in the lower-dimensional feature embedding (Tenenbaum et al., 2000; Roweis & Saul, 2000; Van der Maaten & Hinton, 2008) . However, the labels associated with a task can define any arbitrary partitioning of the data. Therefore, we can attempt to preserve the neighborhood for a task by simply conserving the neighborhoods for the corresponding subset of the data manifold in the feature embedding learned by the few-shot learner. This facilitates effective generalization to new tasks using a limited amount of labeled data by only updating the classifier as the learned feature embedding would likely require very little adaptation. However, existing few-shot learning methods lack an explicit mechanism for achieving this. Further, real-world few-shot learning scenarios like rare disease detection may not have the large number of training tasks required for effective learning, due to various constraints such as data collection costs, privacy concerns, and/or data availability in newer domains (Yao et al., 2022) . In such scenarios, few-shot learning methods are prone to overfit the training tasks, thus limiting the ability to generalization to unseen tasks. Therefore, in this work, we develop a method to explicitly constrain the feature embedding in an attempt to preserve neighborhoods from the high-dimensional task manifold and to construct artificial tasks within these neighborhoods in the feature space, to improve the performance when a limited number of training tasks are available. The proposed approach relies on characterizing the neighborhoods from the high-dimensional task manifold and propagating them through the network with the intent to preserve the task neighborhood in the feature space. We achieve this by employing the concept of interval bounds from the provably robust training literature (Gowal et al., 2019; Morawiecki et al., 2020) , i.e. the axis-aligned bounds for the activations in each layer, obtained using interval arithmetic (Sunaga, 1958) . Concretely, as shown in Figure 1 , we first define a small ϵ-neighborhood for each few-shot training task and then use Interval Bound Propagation (IBP; Gowal et al., 2019) to obtain the bounding box around the mapping of the corresponding neighborhood in the feature embedding space. We then explicitly attempt to preserve the ϵ-neighborhoods by minimizing the distance between a task and its respective interval bounds in addition to optimizing the few-shot classification objective. We further devise a mechanism to construct the artificial tasks by interpolating between a task and its corresponding IBP bounds. It is important to notice that this setup is distinct from provably robust training for few-shot learning in that we do not attempt to minimize (or calculate for that matter) the worst-case classification loss. Task Distribution p(T ) f θ S f θ L-S L U B L LB L CE Learner Sample Task T i

Task manifold

Learned embedding Various methods have been proposed to mitigate the few-task few-shot problem using approaches such as explicit regularization (Jamal & Qi, 2019; Yin et al., 2019) , intra-task augmentation (Lee et al., 2020; Ni et al., 2021; Yao et al., 2021) , and inter-task interpolation to construct new artificial tasks (Yao et al., 2022) . While inter-task interpolation has been shown to be the most effective among these existing approaches, it suffers from the limitation that the artificially created tasks may be generated away from the task manifold depending on the curvature of the feature embedding space, as there is no natural way to select pairs of task which are close to each other on the manifold (Figure 2 (a)). The interval bounds obtained using IBP, on the other hand, are likely to be close to the original task embedding as we explicitly minimize the distance between a task and its interval bounds. Thus, using them for interpolation is likely to keep the generated tasks close to the manifold (Figure 2 (b)).



Figure 1: Illustration of the proposed interval bound propagation-aided few-shot learning setup (best viewed in color): We use interval arithmetic to define a small ϵ-neighborhood around a training task T i sampled from the task distribution p(T ). IBP is then used to obtain the bounding box around the mapping of the said neighborhood in the embedding space f θ S given by the first S layers of the learner f θ . While training the learner f θ to minimize the classification loss L CE on the query set D qi , we additionally attempt to minimize the losses L LB and L U B , forcing the ϵ-neighborhood to be compact in the embedding space as well.

