GRADIENT-BASED TRANSFER LEARNING

Abstract

We formulate transfer learning as a meta-learning problem by extending upon the current meta-learning paradigm in that support and query data are drawn from different, but related distributions of tasks. Inspired by the success of Gradient-Based Meta-Learning, we propose to expand it to the transfer learning setting by constructing a general encoder-decoder architecture that learns a map between functionals of different domains. This is achieved by leveraging on the idea that the task-adapted parameters of a meta-learner can serve as an informative representation of the task itself. We demonstrate the proposed method on regression, prediction of dynamical systems and meta-imitation learning problems.

1. INTRODUCTION

The ability to quickly adapt to unseen conditions is a necessary skill for any intelligent system. It provides the means to generalize outside of the training conditions as well as the capacity to extract unobservable features affecting the learner (Lake et al. (2017) ). Adaptation to a new task involves two steps. The first is inferring the characterizing information of the task at hand. The second is regressing the function representing the task. The importance of this ability is reflected in the considerable volume of work conducted on the matter in the past years e.g. Hospedales et al. (2021) ; Ben-David et al. (2006); Ljung (2010) . The field of meta-learning provides the means to unify these two steps and learn them simultaneously and fully data-driven (Huisman et al. (2021) ). The learning process comprises multiple datasets representing different conditions, or tasks, the learner is concurrently exposed to. Adaptation is performed by extracting the relevant information about each task from a small set of data sampled from the task. In this paper we consider the case of transferring knowledge using a small set of data from a task to another, different, task. In this regard, we build upon the framework of few-shot learning (Wang et al. (2020) ). This can be summarized as estimating an optimal learner for any task with the fewest data samples possible. Recent work has explored the case where the data used for the adaptation and the downstream-task's data are subject to a distributional shift in their domain, referred to as support-query-shift (Bennequin et al. (2021) ). Here, we assume the more general formulation of meta-transfer where the shift can take place on both the domain and co-domain of the underlying function generating the data. This brings us beyond the problem of domain-shift and into the more general notion of learning to transfer between support task and query task. The need for transfer emerges in a multitude of situations. Sequential decision-making problems are one of them. Real-world dynamical systems, for example, are often only partially observable. They require an initial exploration phase to gather the necessary information before estimating a suitable policy. In this case, we would need a way to transfer the knowledge acquired from the dynamics of the system to the estimation of the target policy. That is, transfer between a dynamics prediction model to the estimation of a policy in a control problem. Moreover, transfer learning can be used in situations where we have access to labeled data of a simple problem but would like to solve a more complex, but related, problem. For example, transfer from a single inverted pendulum to a double pendulum with the same dynamics e.g. same length of the poles, same gravity and friction coefficients. To this end, we present an approach to transfer learning through adaptation. Inspired by Gradient-Based Meta-Learning (GBML) we propose a method for meta-transfer learning in a general encoderdecoder model. This can be used independently of the shift between the support task and the query task and is agnostic to architectural changes between the meta-learner and the base-learner (see Fig- 1 

