MELR: META-LEARNING VIA MODELING EPISODE-LEVEL RELATIONSHIPS FOR FEW-SHOT LEARNING

Abstract

Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition. However, this strict adhering to test condition has a negative side effect, that is, the trained model is susceptible to the poor sampling of few shots. In this work, for the first time, this problem is addressed by exploiting interepisode relationships. Specifically, a novel meta-learning via modeling episodelevel relationships (MELR) framework is proposed. By sampling two episodes containing the same set of classes for meta-training, MELR is designed to ensure that the meta-learned model is robust against the presence of poorly-sampled shots in the meta-test stage. This is achieved through two key components: (1) a Cross-Episode Attention Module (CEAM) to improve the ability of alleviating the effects of poorly-sampled shots, and (2) a Cross-Episode Consistency Regularization (CECR) to enforce that the two classifiers learned from the two episodes are consistent even when there are unrepresentative instances. Extensive experiments for non-transductive standard FSL on two benchmarks show that our MELR achieves 1.0%-5.0% improvements over the baseline (i.e., ProtoNet) used for FSL in our model and outperforms the latest competitors under the same settings.

1. INTRODUCTION

Deep convolutional neural networks (CNNs) have achieved tremendous successes in a wide range of computer vision tasks including object recognition (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Russakovsky et al., 2015; He et al., 2016a) , semantic segmentation (Long et al., 2015; Chen et al., 2018) , and object detection (Ren et al., 2015; Redmon et al., 2016) . For most visual recognition tasks, at least hundreds of labeled training images are required from each class for training a CNN model. However, collecting a large number of labeled training samples is costly and may even be impossible in real-life application scenarios (Antonie et al., 2001; Yang et al., 2012) . To reduce the reliance of deep neural networks on large amount of annotated training data, few-shot learning (FSL) has been studied (Vinyals et al., 2016; Finn et al., 2017; Snell et al., 2017; Sung et al., 2018) , which aims to recognize a set of novel classes with only a few labeled samples by knowledge transfer from a set of base classes with abundant samples.

