IEPT: INSTANCE-LEVEL AND EPISODE-LEVEL PRE-TEXT TASKS FOR FEW-SHOT LEARNING

Abstract

The need of collecting large quantities of labeled training data for each new task has limited the usefulness of deep neural networks. Given data from a set of source tasks, this limitation can be overcome using two transfer learning approaches: few-shot learning (FSL) and self-supervised learning (SSL). The former aims to learn 'how to learn' by designing learning episodes using source tasks to simulate the challenge of solving the target new task with few labeled samples. In contrast, the latter exploits an annotation-free pretext task across all source tasks in order to learn generalizable feature representations. In this work, we propose a novel Instance-level and Episode-level Pretext Task (IEPT) framework that seamlessly integrates SSL into FSL. Specifically, given an FSL episode, we first apply geometric transformations to each instance to generate extended episodes. At the instancelevel, transformation recognition is performed as per standard SSL. Importantly, at the episode-level, two SSL-FSL hybrid learning objectives are devised: (1) The consistency across the predictions of an FSL classifier from different extended episodes is maximized as an episode-level pretext task. (2) The features extracted from each instance across different episodes are integrated to construct a single FSL classifier for meta-learning. Extensive experiments show that our proposed model (i.e., FSL with IEPT) achieves the new state-of-the-art.

1. INTRODUCTION

Deep convolutional neural networks (CNNs) (Krizhevsky et al., 2012; He et al., 2016b; Huang et al., 2017) have seen tremendous successes in a wide range of application fields, especially in visual recognition. However, the powerful learning ability of CNNs depends on a large amount of manually labeled training data. In practice, for many visual recognition tasks, sufficient manual annotation is either too costly to collect or not feasible (e.g., for rare object classes). This has severely limited the usefulness of CNNs for real-world application scenarios. Attempts have been made recently to mitigate such a limitation from two distinct perspectives, resulting in two popular research lines, both of which aim to transfer knowledge learned from the data of a set of source tasks to a new target one: few-shot learning (FSL) and self-supervised learning (SSL). FSL (Fei-Fei et al., 2006; Vinyals et al., 2016; Finn et al., 2017; Snell et al., 2017; Sung et al., 2018) typically takes a 'learning to learn' or meta-learning paradigm. That is, it aims to learn an algorithm for learning from few labeled samples, which generalizes well across any tasks. To that end, it adopts

