PARETO MANIFOLD LEARNING: TACKLING MULTIPLE TASKS VIA ENSEMBLES OF SINGLE-TASK MODELS

Abstract

All reviewers]: We have added "notes" such as this one to make the changes more visible. For minor things such as slightly changing plot size, we do not write anything. In Multi-Task Learning, tasks may compete and limit the performance achieved 1 on each other rather than guiding the optimization trajectory to a common solu-2 tion, superior to its single-task counterparts. There is often not a single solution 3 that is optimal for all tasks, leading practitioners to balance tradeoffs between 4



In contrary to single-task learning, where one metric governs the comparison between methods (e.g.,

28

top-1 accuracy in ImageNet), multiple models can be optimal in Multi-Task Learning; e.g., model 29 X yields superior performance on task A compared to model Y, but the reverse holds true for task B; 



tasks' performance, and to resort to optimality in the Pareto sense. Current Multi-5 Task Learning methodologies either completely neglect this aspect of functional 6 diversity, and produce one solution in the Pareto Front predefined by their op-7 timization schemes, or produce diverse but discrete solutions, each requiring a 8 separate training run. In this paper, we conjecture that there exist Pareto Sub-9 spaces, i.e., weight subspaces where multiple optimal functional solutions lie. We 10 propose Pareto Manifold Learning, an ensembling method in weight space that is 11 able to discover such a parameterization and produces a continuous Pareto Front 12 in a single training run, allowing practitioners to modulate the performance on 13 each task during inference on the fly. We validate the proposed method on a di-14 verse set of multi-task learning benchmarks, ranging from image classification to 15 tabular datasets and scene understanding, and show that Pareto Manifold Learning 16 outperforms state-of-the-art algorithms.

17

Task Learning (MTL), multiple tasks are learned concurrently within a single model, striv-19 ing towards infusing inductive bias that will help outperform the single-task baselines. Apart from 20 the promise of superior performance and some theoretical benefits(Ruder, 2017), such as generaliza-21 tion properties for the learned representation, modeling multiple tasks jointly has practical benefits 22 as well, e.g., lower inference times and memory requirements. However, building machine learning 23 models presents a multifaceted host of decisions for multiple and often competing objectives, such 24 as model complexity, runtime and generalization. Conflicts arise since optimizing for one metric of-25 ten leads to the deterioration of other(s). A single solution satisfying optimally all objectives rarely 26 exists and practitioners must balance the inherent trade-offs.

27

30thus, there is not a single better model among the two. This notion of tradeoffs is formally defined 31 as Pareto optimality. Intuitively, improvement on an individual task performance can come only at 32 the expense of another task. However, there exists no framework addressing the need for efficient 33 construction of the Pareto Front, i.e., the set of all Pareto optimal solutions. 34 Recent methods in Multi-Task Learning casted the problem in the lens of multi-objective optimiza-35 tion and introduced the concept of Pareto optimality, resulting in different mechanisms for comput-36 ing the descent direction for the shared parameters. Specifically, Sener & Koltun (2018) produce a 37 single solution that lies on the Pareto Front. As an optimization scheme, however, it is biased to-38 wards the task with the smallest gradient magnitude, as argued in Liu et al. (2020). Lin et al. (2019) 39 expand this idea and, by imposing additional constraints on the objective space to produce multiple 40 solutions on the Pareto Front, each corresponding to a different user-specified tradeoff. Finally, the 41 work by Ma et al. (2020) proposes an orthogonal approach that can be applied after training and 42 starts with a discrete solution set and produces a continuous set (in weight space) around each so-43 1

