AN EVOLUTIONARY APPROACH TO DYNAMIC INTRODUCTION OF TASKS IN LARGE-SCALE MULTITASK LEARNING SYSTEMS

Abstract

Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is often focused to the study of common pitfalls such as catastrophic forgetting instead of being studied at a large scale as a critical component to build the next generation artificial intelligence. We propose an evolutionary method capable of generating large scale multitask models that support the dynamic addition of new tasks. The generated multitask models are sparsely activated and integrates a task-based routing that guarantees bounded compute cost and fewer added parameters per task as the model expands. The proposed method relies on a knowledge compartmentalization technique to achieve immunity against catastrophic forgetting and other common pitfalls such as gradient interference and negative transfer. We demonstrate empirically that the proposed method can jointly solve and achieve competitive results on 69 public image classification tasks, for example improving the state of the art on a competitive benchmark such as cifar10 by achieving a 15% relative error reduction compared to the best model trained on public data.

1. INTRODUCTION

The success of machine learning continues to grow as it finds new applications in areas as diverse as language generation (Brown et al., 2020 ), visual art generation (Ramesh et al., 2021 ), chip design (Mirhoseini et al., 2020 ), protein folding (Senior et al., 2020) and competitive sports (Silver et al., 2016; Vinyals et al., 2019) . The vast majority of machine learning models are designed and trained for a single task and specific data modality, and are often trained by starting with randomly initialized parameters, or with limited knowledge transfer from a pre-trained model. While this paradigm has shown great success, it uses a large amount of computational resources, and does not leverage knowledge transfer from many related tasks in order to achieve higher performance and efficiency. The work presented in this paper is based on the intuition that significant advances can be enabled by dynamic, continual learning approaches capable of achieving knowledge transfer across a very large number of tasks. The method described in this paper can dynamically incorporate new tasks into a large running system, can leverage pieces of a sparse multitask ML model to achieve improved quality for new tasks, and can automatically share pieces of the model among related tasks. This method can enhance quality on each task, and also improve efficiency in terms of convergence time, amount of training examples, energy consumption and human engineering effort. The ML problem framing proposed by this paper can be interpreted as a generalization and synthesis of the standard multitask and continual learning formalization, since an arbitrarily large set of tasks can be solved jointly. But also, over time, the set of tasks can be extended with a continuous stream of new tasks. Furthermore, it lifts the distinction between a pretraining task and a downstream task. As new tasks are incorporated, the system searches for how to combine the knowledge and representations already present in the system with new model capacity in order to achieve high quality 1

