PUTTING THEORY TO WORK: FROM LEARNING BOUNDS TO META-LEARNING ALGORITHMS

Abstract

Most of existing deep learning models rely on excessive amounts of labeled training data in order to achieve state-of-the-art results, even though these data can be hard or costly to get in practice. One attractive alternative is to learn with little supervision, commonly referred to as few-shot learning (FSL), and, in particular, meta-learning that learns to learn with few data from related tasks. Despite the practical success of meta-learning, many of its algorithmic solutions proposed in the literature are based on sound intuitions, but lack a solid theoretical analysis of the expected performance on the test task. In this paper, we review the recent advances in meta-learning theory and show how they can be used in practice both to better understand the behavior of popular meta-learning algorithms and to improve their generalization capacity. This latter is achieved by integrating the theoretical assumptions ensuring efficient meta-learning in the form of regularization terms into several popular meta-learning algorithms for which we provide a large study of their behavior on classic few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of meta-learning theory into practice for the task of few-shot classification.

1. INTRODUCTION

Since the very seeding of the machine learning field, its algorithmic advances were inevitably followed or preceded by the accompanying theoretical analyses establishing the conditions required for the corresponding algorithms to learn well. Such a synergy between theory and practice is reflected in numerous concepts and learning strategies that took their origins in the statistical learning theory: for instance, the famous regularized risk minimization approach is directly related to the minimization of the complexity of the hypothesis space, as suggested by the generalization bounds established for supervised learning (Vapnik, 1992) , while most of the adversarial algorithms in transfer learning (e.g., DANN from (Ganin & Lempitsky, 2015)) follow the theoretical insights provided by the seminal theory of its domain (Ben-David et al., 2010) . Even though many machine learning methods now enjoy a solid theoretical justification, some more recent advances in the field are still in their preliminary state which requires the hypotheses put forward by the theoretical studies to be implemented and verified in practice. One such notable example is the emerging field of meta-learning, also called learning to learn (LTL), where the goal is to produce a model on data coming from a set of (meta-train) source tasks to use it as a starting point for learning successfully a new previously unseen (meta-test) target task with little supervision. This kind of approach comes in particularly handy when training deep learning models as their performance crucially depends on the amount of training data that can be difficult and/or expensive to get in some applications. Several theoretical studies (Baxter, 2000; Pentina & Lampert, 2014; Maurer et al., 2016; Amit & Meir, 2018; Yin et al., 2020) foot_0 provided probabilistic meta-learning bounds that require the amount of data in the meta-train source task and the number of meta-train tasks to tend to infinity for efficient meta-learning. While capturing the underlying general intuition, these bounds do not suggest that all the source data is useful in such learning setup due to the



We omit other works for meta-learning via online convex optimization(Finn et al., 2019; Balcan et al., 2019; Khodak et al., 2019; Denevi et al., 2019) as they concern a different learning setup.

