INFORMATION THEORETIC META LEARNING WITH GAUSSIAN PROCESSES

Abstract

We formulate meta learning using information theoretic concepts such as mutual information and the information bottleneck. The idea is to learn a stochastic representation or encoding of the task description, given by a training or support set, that is highly informative about predicting the validation set. By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning. We particularly develop new memorybased meta learning algorithms based on Gaussian processes and derive extensions that combine memory and gradient-based meta learning. We demonstrate our method on few-shot regression and classification by using standard benchmarks such as Omniglot, mini-Imagenet and Augmented Omniglot.

1. INTRODUCTION

Meta learning (Ravi & Larochelle, 2017; Vinyals et al., 2016; Edwards & Storkey, 2017; Finn et al., 2017; Lacoste et al., 2019; Nichol et al., 2018) and few-shot learning (Li et al., 2006; Lake et al., 2011) aim to derive data efficient learning algorithms that can rapidly adapt to new tasks. Such systems require training deep neural networks from a set of tasks drawn from a common distribution, where each task is described by a small amount of experience, typically divided into a training or support set and a validation set. By sharing information across tasks the neural network can learn to rapidly adapt to new tasks and generalize from few examples at test time. Several few-shot learning algorithms use memory-based (Vinyals et al., 2016; Ravi & Larochelle, 2017) or gradient-based procedures (Finn et al., 2017; Nichol et al., 2018) , with the gradient-based model agnostic meta learning algorithm (MAML) by Finn et al. (2017) being very influential in the literature. Despite the success of specific schemes, one fundamental issue in meta learning is concerned with deriving unified principles that can allow to relate different approaches and invent new schemes. While there exist probabilistic interpretations of existing methods, such as the approximate Bayesian inference approach (Grant et al., 2018; Finn et al., 2018; Yoon et al., 2018) and the related conditional probability modelling approach (Garnelo et al., 2018; Gordon et al., 2019) , meta learning still lacks of a general and tractable learning principle that can help to get a better understanding of existing algorithms and derive new methods. To this end, the main contribution of this paper is to introduce an information theoretic view of meta learning, by utilizing tools such as the mutual information and the information bottleneck (Cover & Thomas, 2006; Tishby et al., 1999) . Given that each task consists of a support or training set and a target or validation set, we consider the information bottleneck principle, introduced by Tishby et al. (1999) , which can learn a stochastic encoding of the support set that is highly informative about predicting the validation set. Such stochastic encoding is optimized through the difference between two mutual informations, so that the encoding compresses the training set into a representation that can predict well the validation set. By exploiting recent variational approximations to the information bottleneck (Alemi et al., 2017; Chalk et al., 2016; Achille & Soatto, 2016) that make use of variational lower bounds on the mutual information (Barber & Agakov, 2003) , we derive a general and tractable framework for meta learning. Such framework can allow us to re-interpret gradient-based algorithms, such as MAML, and also derive new methods. Based on the variational information bottleneck (VIB) framework (Alemi et al., 2017; Chalk et al., 2016; Achille & Soatto, 2016) , we introduce a new memory-based algorithm for supervised fewshot learning (right panel in Figure 1 ) based on Gaussian processes (Rasmussen & Williams, 2006) 

