CHAMELEON: LEARNING MODEL INITIALIZATIONS ACROSS TASKS WITH DIFFERENT SCHEMAS

Abstract

Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. Recent work shows that an initial parameter set can be learned from a population of supervised learning tasks that enables a fast convergence for unseen tasks even when only a handful of instances is available (model-agnostic meta-learning). Currently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type, and semantics of predictor and target variables. In this paper, we address the problem of meta-learning weight initialization across tasks with different schemas, for example, if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. In experiments on 23 datasets of the OpenML-CC18 benchmark, we show that Chameleon can successfully learn parameter initializations across tasks with different schemas, presenting, to the best of our knowledge, the first cross-dataset few-shot classification approach for unstructured data.

1. INTRODUCTION

Humans require only a few examples to correctly classify new instances of previously unknown objects. For example, it is sufficient to see a handful of images of a specific type of dog before being able to classify dogs of this type consistently. In contrast, deep learning models optimized in a classical supervised setup usually require a vast number of training examples to match human performance. A striking difference is that a human has already learned to classify countless other objects, while parameters of a neural network are typically initialized randomly. Previous approaches improved this starting point for gradient-based optimization by choosing a more robust random initialization (He et al., 2015) or by starting from a pretrained network (Pan & Yang, 2010) . Still, models do not learn from only a handful of training examples even when applying these techniques. Moreover, established hyperparameter optimization methods (Schilling et al., 2016) are not capable of optimizing the model initialization due to the high-dimensional parameter space. Few-shot classification aims at correctly classifying unseen instances of a novel task with only a few labeled training instances given. This is typically accomplished by meta-learning across a set of training tasks, which consist of training and validation examples with given labels for a set of classes. The field has gained immense popularity among researchers after recent meta-learning approaches have shown that it is possible to learn a weight initialization across different tasks, which facilitates a faster convergence speed and thus enables classifying novel classes after seeing only a few instances (Finn et al., 2018) . However, training a single model across different tasks is only feasible if all tasks share the same schema, meaning that all instances share one set of features in identical order. For that reason, most approaches demonstrate their performance on image data, which can be easily scaled to a fixed shape, whereas transforming unstructured data to a uniform schema is not trivial. We want to extend popular approaches to operate invariant of schema, i.e., independent of order and shape, making it possible to use meta-learning approaches on unstructured data with varying feature spaces, e.g., learning a model from heart disease data that can accurately classify a few-shot task for diabetes detection that relies on similar features. Thus, we require a schema-invariant encoder that maps heart disease and diabetes data to one feature representation, which then can be used to train a single model via popular meta-learning algorithms like REPTILE (Nichol et al., 2018b) .

