LEARNING WITHOUT FORGETTING: TASK AWARE MULTI-TASK LEARNING FOR MULTI-MODALITY TASKS

Abstract

Existing Multi-Task Learning(MTL) strategies like joint or meta-learning focus more on shared learning and have little to no scope for task-specific learning. This creates the need for a distinct shared pretraining phase and a task-specific finetuning phase. The finetuning phase creates separate models for each task, where improving the performance of a particular task necessitates forgetting some of the knowledge garnered in other tasks. Humans, on the other hand, perform task-specific learning in synergy with general domain-based learning. Inspired by these learning patterns in humans, we suggest a simple yet generic task aware framework to incorporate into existing MTL strategies. The proposed framework computes task-specific representations to modulate the model parameters during MTL. Hence, it performs both shared and task-specific learning in a single phase resulting in a single model for all the tasks. The single model itself achieves significant performance gains over the existing MTL strategies. For example, we train a model on Speech Translation (ST), Automatic Speech Recognition (ASR), and Machine Translation (MT) tasks using the proposed task aware multitask learning approach. This single model achieves a performance of 28.64 BLEU score on ST MuST-C English-German, WER of 11.61 on ASR TEDLium v3, and BLEU score of 23.35 on MT WMT14 English-German tasks. This sets a new state-of-the-art performance (SOTA) on the ST task while outperforming the existing end-to-end ASR systems with a competitive performance on the MT task.

1. INTRODUCTION

The process of Multi-Task Learning (MTL) on a set of related tasks is inspired by the patterns displayed by human learning. It involves a pretraining phase over all the tasks, followed by a finetuning phase. During pretraining, the model tries to grasp the shared knowledge of all the tasks involved, while in the finetuning phase, task-specific learning is performed to improve the performance. However, as a result of the finetuning phase, the model forgets the information about the other tasks that it learnt during pretraining. Humans, on the other hand, are less susceptible to forgetfulness and retain existing knowledge/skills while mastering a new task. For example, a polyglot who masters a new language learns to translate from this language without losing the ability to translate other languages. Moreover, the lack of task-based flexibility and having different finetuning/pretraining phases cause gaps in the learning process due to the following reasons: Role Mismatch: Consider the MTL system being trained to perform the Speech Translation(ST), Automatic Speech Recognition(ASR) and Machine Translation(MT) tasks. The Encoder block has a very different role in the standalone ASR, MT and ST models and hence we cannot expect a single encoder to perform well on all the tasks without any cues to identify/use task information. Moreover, there is a discrepancy between pretraining and finetuning hampering the MTL objective. Task Awareness: At each step in the MTL, the model tries to optimize over the task at hand. For tasks like ST and ASR with the same source language, it is impossible for the model to identify the task and alter its parameters accordingly, hence necessitating a finetuning phase. A few such examples have been provided in Table 1 . Humans, on the other hand, grasp the task they have to perform by means of context or explicit cues.

