MULTI-REPRESENTATION ENSEMBLE IN FEW-SHOT LEARNING

Abstract

Deep neural networks (DNNs) compute representations in a layer by layer fashion, producing a final representation at the top layer of the pipeline, and classification or regression is made using the final representation. A number of DNNs (e.g., ResNet, DenseNet) have shown that representations from the earlier layers can be beneficial. They improved performance by aggregating representations from different layers. In this work, we asked the question, besides forming an aggregation, whether these representations can be utilized directly with the classification layer(s) to obtain better performance. We started our quest to the answer by investigating the classifiers based on the representations from different layers and observed that these classifiers were diverse and many of their decisions were complementary to each other, hence having the potential to generate a better overall decision when combined. Following this observation, we propose an ensemble method that creates an ensemble of classifiers, each taking a representation from a different depth of a base DNN as the input. We tested this ensemble method in the setting of few-shot learning. Experiments were conducted on the mini-ImageNet and tiered-ImageNet datasets which are commonly used in the evaluation of fewshot learning methods. Our ensemble achieves the new state-of-the-art results for both datasets, comparing to previous regular and ensemble approaches.

1. INTRODUCTION

The depth of a deep neural network is a main factor that contributes to the high capacity of the network. In deep neural networks, information is often processed in a layer by layer fashion through many layers, before it is fed to the final classification (regression) layer(s). From a representation learning point of view, a representation is computed sequentially through the layers and a final representation is used to perform the targeted task. There have been deep neural networks that try to exploit the lower layers in the sequence to achieve better learning results. GoogLeNets (Szegedy et al., 2015) added auxiliary losses to the lower layers to facilitate training. Skip links (such as the ones used in ResNet (He et al., 2016) and DenseNet (Huang et al., 2017) ) may be added to connect the lower layers to the higher ones in a deep architecture. Even though the main purposes of these approaches are to assist the training process or to help the gradient back-propagation, the success of these approaches suggests that the representations from the lower layers may be beneficial to many learning tasks. Therefore, it is worth to rethink the standard sequential structure where a final representation is used to make the prediction. In this work, we ask the question whether the representations from the lower layers can be used directly (instead of being auxiliary or being aggregated into a final representation) for decision making. If so, how can we take advantage of these lower-level representations and what are good practices in doing so? We first investigated the problem by conducting classifications using the representations from different layers. We took the convolutional layers of a trained network as an encoder. The representations (feature maps) from different layers of the encoder were tested for their classification performance. We observed that although overall, the feature maps from the higher layers led to better performance, there was a significant number of cases that correct predictions could be made with the lower feature maps but the higher-level feature maps failed to do so. This suggested that the lower-level representations have the potential to help the classification directly (detailed analysis in Section 3).

