αVIL: LEARNING TO LEVERAGE AUXILIARY TASKS FOR MULTITASK LEARNING

Abstract

Multitask Learning is a Machine Learning paradigm that aims to train a range of (usually related) tasks with the help of a shared model. While the goal is often to improve the joint performance of all training tasks, another approach is to focus on the performance of a specific target task, while treating the remaining ones as auxiliary data from which to possibly leverage positive transfer towards the target during training. In such settings, it becomes important to estimate the positive or negative influence auxiliary tasks will have on the target. While many ways have been proposed to estimate task weights before or during training they typically rely on heuristics or extensive search of the weighting space. We propose a novel method called α-Variable Importance Learning (αVIL) that is able to adjust task weights dynamically during model training, by making direct use of taskspecific updates of the underlying model's parameters between training epochs. Experiments indicate that αVIL is able to outperform other Multitask Learning approaches in a variety of settings. To our knowledge, this is the first attempt at making direct use of model updates for task weight estimation.

1. INTRODUCTION

In Machine Learning, we often encounter tasks that are at least similar, if not even almost identical. For example, in Computer Vision, multiple datasets might require object segmentation or recognition (Deng et al., 2009; LeCun et al., 1998; Lin et al., 2014) whereas in Natural Language Processing, tasks can deal with sentence entailment (De Marneffe et al., 2019) or paraphrase recognition (Quirk et al., 2004) , both of which share similarities and fall under the category of Natural Language Understanding. Given that many such datasets are accessible to researchers, a naturally emerging question is whether we can leverage their commonalities in training setups. Multitask Learning (Caruana, 1993) is a Machine Learning paradigm that aims to address the above by training a group of sufficiently similar tasks together. Instead of optimizing each individual task's objective, a shared underlying model is fit so as to maximize a global performance measure, for example a LeNet-like architecture (LeCun et al., 1998) for Computer Vision, or a Transformer-based encoder (Vaswani et al., 2017) for Natural Language Processing problems. For a broader perspective of Multitask Learning approaches, we refer the reader to the overviews of Ruder (2017); Vandenhende et al. (2020) . In this paper we introduce αVIL, an approach to Multitask Learning that estimates individual task weights through direct, gradient-based metaoptimization on a weighted accumulation of taskspecific model updates. To our knowledge, this is the first attempt to leverage task-specific model deltas, that is, realized differences of model parameters before and after a task's training steps, to directly optimize task weights for target task-oriented multitask learning. We perform initial experiments on multitask setups in two domains, Computer Vision and Natural Language Understanding, and show that our method is able to successfully learn a good weighting of classification tasks.

2. RELATED WORK

Multitask Learning (MTL) can be divided into techniques which aim to improve a joint performance metric for a group of tasks (Caruana, 1993) , and methods which use auxiliary tasks to boost the performance of a single target task (Caruana, 1998; Bingel & Søgaard, 2017) .

