DATASET META-LEARNING FROM KERNEL RIDGE-REGRESSION

Abstract

One of the most fundamental aspects of any machine learning algorithm is the training data used by the algorithm. We introduce the novel concept ofapproximation of datasets, obtaining datasets which are much smaller than or are significant corruptions of the original training data while maintaining similar model performance. We introduce a meta-learning algorithm called Kernel Inducing Points (KIP ) for obtaining such remarkable datasets, inspired by the recent developments in the correspondence between infinitely-wide neural networks and kernel ridge-regression (KRR). For KRR tasks, we demonstrate that KIP can compress datasets by one or two orders of magnitude, significantly improving previous dataset distillation and subset selection methods while obtaining state of the art results for MNIST and CIFAR-10 classification. Furthermore, our KIP -learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime, which leads to state of the art results for neural network dataset distillation with potential applications to privacy-preservation.

1. INTRODUCTION

Datasets are a pivotal component in any machine learning task. Typically, a machine learning problem regards a dataset as given and uses it to train a model according to some specific objective. In this work, we depart from the traditional paradigm by instead optimizing a dataset with respect to a learning objective, from which the resulting dataset can be used in a range of downstream learning tasks. Our work is directly motivated by several challenges in existing learning methods. Kernel methods or instance-based learning (Vinyals et al., 2016; Snell et al., 2017; Kaya & Bilge, 2019) in general require a support dataset to be deployed at inference time. Achieving good prediction accuracy typically requires having a large support set, which inevitably increases both memory footprint and latency at inference time-the scalability issue. It can also raise privacy concerns when deploying a support set of original examples, e.g., distributing raw images to user devices. Additional challenges to scalability include, for instance, the desire for rapid hyper-parameter search (Shleifer & Prokop, 2019) and minimizing the resources consumed when replaying data for continual learning (Borsos et al., 2020) . A valuable contribution to all these problems would be to find surrogate datasets that can mitigate the challenges which occur for naturally occurring datasets without a significant sacrifice in performance.

This suggests the following

Question: What is the space of datasets, possibly with constraints in regards to size or signal preserved, whose trained models are all (approximately) equivalent to some specific model? In attempting to answer this question, in the setting of supervised learning on image data, we discover a rich variety of datasets, diverse in size and human interpretability while also robust to model architectures, which yield high performance or state of the art (SOTA) results when used as training data. We obtain such datasets through the introduction of a novel meta-learning algorithm called Kernel Inducing Points (KIP ). Figure 1 shows some example images from our learned datasets.

