CONCEPT LEARNERS FOR FEW-SHOT LEARNING

Abstract

Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine-and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along humaninterpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including finegrained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6-15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model's predictions.

1. INTRODUCTION

Deep learning has reached human-level performance on domains with the abundance of large-scale labeled training data. However, learning on tasks with a small number of annotated examples is still an open challenge. Due to the lack of training data, models often overfit or are too simplistic to provide good generalization. On the contrary, humans can learn new tasks very quickly by drawing upon prior knowledge and experience. This ability to rapidly learn and adapt to new environments is a hallmark of human intelligence. Few-shot learning (Miller et al., 2000; Fei-Fei et al., 2006; Koch et al., 2015) aims at addressing this fundamental challenge by designing algorithms that are able to generalize to new tasks given only a few labeled training examples. Meta-learning (Schmidhuber, 1987; Bengio et al., 1992) has recently made major advances in the field by explicitly optimizing the model's ability to generalize, or learning how to learn, from many related tasks (Snell et al., 2017; Vinyals et al., 2016; Ravi & Larochelle, 2017; Finn et al., 2017) . Motivated by the way humans effectively use prior knowledge, meta-learning algorithms acquire prior knowledge over previous tasks so that new tasks can be efficiently learned from a small amount of data. However, recent works (Chen et al., 2019b; Raghu et al., 2020) show that simple baseline methods perform comparably to existing meta-learning methods, opening the question about which components are crucial for rapid adaptation and generalization. Here, we argue that there is an important missing piece in this puzzle. Human knowledge is structured in the form of reusable concepts. For instance, when we learn to recognize new bird species we are already equipped with the critical concepts, such as wing, beak, and feather. We then focus on these specific concepts and combine them to identify a new species. While learning to recognize new species is challenging in the complex bird space, it becomes remarkably simpler once the reasoning is structured into familiar concepts. Moreover, such a structured way of cognition gives us the ability to provide reasoning behind our decisions, such as "ravens have thicker beaks than crows, with more of a curve to the end". We argue that this lack of structure is limiting the generalization ability of the current meta-learners. The importance of compositionality for few-shot learning was emphasized in (Lake et al., 2011; 2015) where hand-designed features of strokes were combined using Bayesian program learning. Motivated by the structured form of human cognition, we propose COMET, a meta-learning method that discovers generalizable representations along human-interpretable concept dimensions. COMET learns a unique metric space for each concept dimension using concept-specific embedding functions, named concept learners, that are parameterized by deep neural networks. Along each high-level dimension, COMET defines concept prototypes that reflect class-level differences in the metric space of the underlying concept. To obtain final predictions, COMET effectively aggregates information from diverse concept learners and concept prototypes. Three key aspects lead to a strong generalization ability of our approach: (i) semi-structured representation learning, (ii) concept-specific metric spaces described with concept prototypes, and (iii) ensembling of many models. The latter assures that the combination of diverse and accurate concept learners improves the generalization ability of the base learner (Hansen & Salamon, 1990; Dvornik et al., 2019) . Remarkably, the high-level universe of concepts that are used to guide our algorithm can be discovered in a fully unsupervised way, or we can use external knowledge bases to define concepts. In particular, we can get a large universe of noisy, incomplete and redundant concepts and COMET learns which subsets of those are important by assigning local and global concept importance scores. Unlike existing methods (Snell et al., 2017; Vinyals et al., 2016; Sung et al., 2018; Gidaris & Komodakis, 2018 ), COMET's predictions are interpretable-an advantage especially important in the few-shot learning setting, where predictions are based only on a handful of labeled examples making it hard to trust the model. As such, COMET is the first domain-agnostic interpretable meta-learning approach. We demonstrate the effectiveness of our approach on tasks from extremely diverse domains, including fine-grained image classification in computer vision, document classification in natural language processing, and cell type annotation in biology. In the biological domain, we conduct the first systematic comparison of meta-learning algorithms. We develop a new meta-learning dataset and define a novel benchmark task to characterize single-cell transcriptome of all mouse organs (Consortium, 2018; 2020) . Additionally, we consider the scenario in which concepts are not given in advance, and test COMET's performance with automatically extracted visual concepts. Our experimental results show that on all domains COMET significantly improves generalization ability, achieving 6-15% relative improvement over state-of-the-art methods in the most challenging 1-shot task. Furthermore, we demonstrate the ability of COMET to provide interpretations behind the model's predictions, and support our claim with quantitative and qualitative evaluations of the generated explanations.

2. PROPOSED METHOD

Problem formulation. In few-shot classification, we assume that we are given a labeled training set D tr , an unlabeled query set D qr , and a support set S consisting of a few labeled data points that share the label space with the query set. Label space between training and query set is disjoint, i.e., {Y tr } ∩ {Y qr } = ∅, where {Y tr } denotes label space of training set and {Y qr } denotes label space of



Figure1: Along each concept dimension, COMET learns concept embeddings using independent concept learners and compares them to concept prototypes. COMET then effectively aggregates information across concept dimensions, assigning concept importance scores to each dimension.

