ALPHA NET: ADAPTATION WITH COMPOSITION IN CLASSIFIER SPACE

Abstract

Deep learning classification models typically train poorly on classes with small numbers of examples. Motivated by the human ability to solve this task, models have been developed that transfer knowledge from classes with many examples to learn classes with few examples. Critically, the majority of these models transfer knowledge within model feature space. In this work, we demonstrate that transferring knowledge within classifier space is more effective and efficient. Specifically, by linearly combining strong nearest neighbor classifiers along with a weak classifier, we are able to compose a stronger classifier. Uniquely, our model can be implemented on top of any existing classification model that includes a classifier layer. We showcase the success of our approach in the task of long-tailed recognition, whereby the classes with few examples, otherwise known as the "tail" classes, suffer the most in performance and are the most challenging classes to learn. Using classifier-level knowledge transfer, we are able to drastically improve -by a margin as high as 10.5% -the state-of-the-art performance on the "tail" categories.

1. INTRODUCTION

The computer vision field has made rapid progress in the area of object recognition due to several factors: complex architectures, larger compute power, more data, and better learning strategies. However, the standard method to train recognition models on new classes still relies on training using large sets of examples. This dependence on large scale data has made learning from few samples a natural challenge. Highlighting this point, new tasks such as low-shot learning and longtailed learning, have recently become common within computer vision. Many approaches to learning from small numbers of examples are inspired by human learning. In particular, humans are able to learn new concepts quickly and efficiently over only a few samples. The overarching theory is that humans are able to transfer their knowledge from previous experiences to bootstrap their new learning task (Lake et al., 2017; 2015; Gopnik & Sobel, 2000) . Inherent in these remarkable capabilities are two related questions: what knowledge is being transferred and how is this knowledge being transferred? Within computer vision, recent low-shot learning and long-tailed recognition models answer these questions by treating visual "representations" as the knowledge structures that are being transferred. As such, the knowledge transfer methods implemented in these models transfer learned features from known classes learned from large data to the learning of new classes with low data (Liu et al., 2019; Yin et al., 2019) . These models exemplify the broader assumption that, in both human and computer vision, knowledge transfer occurs within model representation and feature space (Lake et al., 2015) . In contrast, we claim that previously learned information is more concisely captured in classifier space. This inference is based on the fact that sample representation is unique to that sample, but classifiers are fitted for all the samples in a given class. The success of working within classifier space to improve certain classifiers has been established in several papers (Elhoseiny et al., 2013; Qi et al., 2018) , where the models are able to directly predict classifiers from features or create new models entirely by learning other models. Other non-deep learning models use classifiers learnt with abundant data to generate novel classifiers (Aytar & Zisserman, 2011; 2012) . 1

