ENHANCING META LEARNING VIA MULTI-OBJECTIVE SOFT IMPROVEMENT FUNCTIONS

Abstract

Meta-learning tries to leverage information from similar learning tasks. In the commonly-used bilevel optimization formulation, the shared parameter is learned in the outer loop by minimizing the average loss over all tasks. However, the converged solution may be compromised in that it only focuses on optimizing on a small subset of tasks. To alleviate this problem, we consider meta-learning as a multi-objective optimization (MOO) problem, in which each task is an objective. However, existing MOO solvers need to access all the objectives' gradients in each iteration, and cannot scale to the huge number of tasks in typical meta-learning settings. To alleviate this problem, we propose a scalable gradient-based solver with the use of mini-batch. We provide theoretical guarantees on the Pareto optimality or Pareto stationarity of the converged solution. Empirical studies on various machine learning settings demonstrate that the proposed method is efficient, and achieves better performance than the baselines, particularly on improving the performance of the poorly-performing tasks and thus alleviating the compromising phenomenon.

1. INTRODUCTION

Meta-learning, also known as "learning to learn", aims to enable models to learn more effectively by leveraging information from many similar learning tasks (Hospedales et al., 2020) . In recent years, meta-learning has received much attention for its fast adaptation to new learning scenarios with limited data (Kao et al., 2021; Finn et al., 2017; Snell et al., 2017; Lee et al., 2019; Nichol et al., 2018; Deleu et al., 2022; Rajeswaran et al., 2019; Vilalta & Drissi, 2002) . It is usually formulated as a bi-level optimization problem (Franceschi et al., 2018; Hong et al., 2020) , which finds task-specific parameters in the inner level and minimizes the average loss over tasks in the outer level. Recently, Wang et al. ( 2021) reformulate meta-learning as a multi-task learning problem. From this perspective, minimizing the average loss in the outer level using (stochastic) gradient descent may not always be desirable. Specifically, it may suffer from the compromising (or conflicting) phenomenon, in which the converged solution only focuses on minimizing the losses of a small subset of tasks while ignoring the others (Yu et al., 2020; Liu et al., 2021a; Sener & Koltun, 2018) . This compromised solution may thus lead to poor performance. To alleviate this problem, we propose reformulating meta-learning as a multi-objective optimization (MOO) problem, in which each task is an objective. The performance of all tasks (objectives) are then considered during optimization (Emmerich & Deutz, 2018) . A popular class of MOO solvers is the gradient-based approach (Liu et al., 2021a; Yu et al., 2020; Sener & Koltun, 2018; Navon et al., 2022; Liu et al., 2021b) , with prominent examples such as the multiple-gradient descent algorithm (MGDA) (Désidéri, 2012; Sener & Koltun, 2018 ), PCGard (Yu et al., 2020 ), and CAGard (Liu et al., 2021a) . In each iteration, they find a common descent direction among all objective gradients, instead of simply optimizing the average performance over all objectives. Existing gradient-based MOO methods require using gradients from all the objectives. However, when formulating meta-learning as a MOO problem with each task being an objective, computing all these gradients in each iteration can become very expensive, as the number of objectives (i.e., tasks) 1

