UNSUPERVISED META-LEARNING VIA FEW-SHOT PSEUDO-SUPERVISED CONTRASTIVE LEARNING

Abstract

Unsupervised meta-learning aims to learn generalizable knowledge across a distribution of tasks constructed from unlabeled data. Here, the main challenge is how to construct diverse tasks for meta-learning without label information; recent works have proposed to create, e.g., pseudo-labeling via pretrained representations or creating synthetic samples via generative models. However, such a task construction strategy is fundamentally limited due to heavy reliance on the immutable pseudo-labels during meta-learning and the quality of the representations or the generated samples. To overcome the limitations, we propose a simple yet effective unsupervised meta-learning framework, coined Pseudo-supervised Contrast (PsCo), for few-shot classification. We are inspired by the recent self-supervised learning literature; PsCo utilizes a momentum network and a queue of previous batches to improve pseudo-labeling and construct diverse tasks in a progressive manner. Our extensive experiments demonstrate that PsCo outperforms existing unsupervised meta-learning methods under various in-domain and cross-domain few-shot classification benchmarks. We also validate that PsCo is easily scalable to a large-scale benchmark, while recent prior-art meta-schemes are not.

1. INTRODUCTION

Learning to learn (Thrun & Pratt, 1998) , also known as meta-learning, aims to learn general knowledge about how to solve unseen, yet relevant tasks from prior experiences solving diverse tasks. In recent years, the concept of meta-learning has found various applications, e.g., few-shot classification (Snell et al., 2017; Finn et al., 2017) , reinforcement learning (Duan et al., 2017; Houthooft et al., 2018; Alet et al., 2020) , hyperparameter optimization (Franceschi et al., 2018) , and so on. Among them, few-shot classification is arguably the most popular one, whose goal is to learn some knowledge to classify test samples of unseen classes during (meta-)training with few labeled samples. The common approach is to construct a distribution of few-shot classification (i.e., N -way K-shot) tasks and optimize a model to generalize across tasks (sampled from the distribution) so that it can rapidly adapt to new tasks. This approach has shown remarkable performance in various few-shot classification tasks but suffers from limited scalability as the task construction phase typically requires a large number of human-annotated labels. To mitigate the issue, there have been several recent attempts to apply meta-learning to unlabeled data, i.e., unsupervised meta-learning (UML) (Hsu et al., 2019; Khodadadeh et al., 2019; 2021; Lee et al., 2021; Kong et al., 2021) . To perform meta-learning without labels, the authors have suggested various ways to construct synthetic tasks. For example, pioneering works (Hsu et al., 2019; Khodadadeh et al., 2019) assigned pseudo-labels via data augmentations or clustering based on pretrained representations. In contrast, recent approaches (Khodadadeh et al., 2021; Lee et al., 2021; Kong et al., 2021) utilized generative models to generate synthetic (in-class) samples or learn unknown labels via categorical latent variables. They have achieved moderate performance in few-shot learning benchmarks, but are fundamentally limited as: (a) the pseudo-labeling strategies are fixed during meta-learning and impossible to correct mislabeled samples; (b) the generative approaches heavily rely on the quality of generated samples and are cumbersome to scale into large-scale setups. PsCo constructs an Nway K-shot few-shot classification task using the current mini-batch {x i } and the queue of previous mini-batches; and then, it learns the task via contrastive learning. Here, A is a label assignment matrix found by the Sinkhorn-Knopp algorithm (Cuturi, 2013), A is a pre-defined augmentation distribution, f is a backbone feature extractor, g and h are projection and prediction MLPs, respectively, and ϕ is an exponential moving average (EMA) of the model parameter θ. To overcome the limitations of the existing UML approaches, in this paper, we ask whether one can (a) progressively improve a pseudo-labeling strategy during meta-learning, and (b) construct more diverse tasks without generative models. We draw inspiration from recent advances in selfsupervised learning literature (He et al., 2020; Khosla et al., 2020) , which has shown remarkable success in representation learning without labeled data. In particular, we utilize (a) a momentum network to improve pseudo-labeling progressively via temporal ensemble; and (b) a momentum queue to construct diverse tasks using previous mini-batches in an online manner. Formally, we propose Pseudo-supervised Contrast (PsCo), a novel and effective unsupervised metalearning framework, for few-shot classification. Our key idea is to construct few-shot classification tasks using the current and previous mini-batches based on the momentum network and the momentum queue. Specifically, given a random mini-batch of N unlabeled samples, we treat them as N queries (i.e., test samples) of different N labels, and then select K shots (i.e., training samples) for each label from the queue of previous mini-batches based on representations extracted by the momentum network. To further improve the selection procedure, we utilize top-K sampling after applying a matching algorithm, Sinkhorn-Knopp (Cuturi, 2013) . Finally, we optimize our model via supervised contrastive learning (Khosla et al., 2020) for solving the N -way K-shot task. Remark that our few-shot task construction relies on not only the current mini-batch but also the momentum network and the queue of previous mini-batches. Therefore, our task construction (i.e., pseudo-labeling) strategy (a) is progressively improved during meta-learning with the momentum network, and (b) constructs diverse tasks since the shots can be selected from the entire dataset. Our framework is illustrated in Figure 1 . Throughout extensive experiments, we demonstrate the effectiveness of the proposed framework, PsCo, under various few-shot classification benchmarks. First, PsCo achieves state-of-the-art performance under both Omniglot (Lake et al., 2011) and miniImageNet (Ravi & Larochelle, 2017) few-shot benchmarks; its performance is even competitive with supervised meta-learning methods. Next, PsCo also shows superiority under cross-domain few-shot learning scenarios. Finally, we demonstrate that PsCo is scalable to a large-scale benchmark, ImageNet (Deng et al., 2009) . We summarize our contributions as follows: • We propose PsCo, an effective unsupervised meta-learning (UML) framework for few-shot classification, which constructs diverse few-shot pseudo-tasks without labels utilizing the momentum network and the queue of previous batches in a progressive manner. • We achieve state-of-the-art performance on few-shot classification benchmarks, Omniglot (Lake et al., 2011) and miniImageNet (Ravi & Larochelle, 2017) . For example, PsCo outperforms the prior art of UML, Meta-SVEBM (Kong et al., 2021) , by 5% accuracy gain (58.03→63.26), for 5-way 5-shot tasks of miniImageNet (see Table 1 ).



Figure1: An overview of the proposed Pseudo-supervised Contrast (PsCo). PsCo constructs an Nway K-shot few-shot classification task using the current mini-batch {x i } and the queue of previous mini-batches; and then, it learns the task via contrastive learning. Here, A is a label assignment matrix found by the Sinkhorn-Knopp algorithm (Cuturi, 2013), A is a pre-defined augmentation distribution, f is a backbone feature extractor, g and h are projection and prediction MLPs, respectively, and ϕ is an exponential moving average (EMA) of the model parameter θ.

