PROGRESSIVE PROMPTS: CONTINUAL LEARNING FOR LANGUAGE MODELS

Abstract

We introduce Progressive Prompts -a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters. Progressive Prompts learns a new soft prompt for each task and sequentially concatenates it with the previously learned prompts, while keeping the base model frozen. Experiments on standard continual learning benchmarks show that our approach outperforms state-of-the-art methods, with an improvement >20% in average test accuracy over the previous best-preforming method on T5 model. We also explore a more challenging continual learning setup with longer sequences of tasks and show that Progressive Prompts significantly outperforms prior methods.

1. INTRODUCTION

Learning a long sequence of tasks while gaining experience and avoiding forgetting remains a key feature of human-level intelligence. Although pretrained language models have largely succeeded in learning on a single task, their performance degrades in scenarios where multiple tasks are encountered sequentially, also known as continual learning (CL) (de Masson D'Autume et al., 2019; Huang et al., 2021) . Two major challenges arise in CL: (1) avoiding catastrophic forgetting, i.e., loss of the knowledge acquired from previous tasks after learning new ones (McCloskey & Cohen, 1989; Ratcliff, 1990) , and (2) allowing forward transfer, i.e., leveraging the knowledge from past tasks for efficient learning of new tasks. Typical CL approaches for language models train a model on all tasks, which ensures forward transfer but also leads to forgetting. These methods use data replay or add regularization constraints (Huang et al., 2021; de Masson D'Autume et al., 2019; Sun et al., 2019) , but they still suffer from forgetting due to inevitable changes in parameters shared between tasks. Other approaches, such as progressive networks (Rusu et al., 2016) , can alleviate catastrophic forgetting completely while supporting forward transfer, but are computationally expensive because they add a new copy of the model for each task. This can be especially intractable for large-scale language models with billions of parameters, which have become a standard in the NLP field (Zhang et al., 2022) . In this paper, we introduce Progressive Prompts -a novel CL approach for language models that supports forward transfer without forgetting. Our method is inspired by progressive networks, but is significantly more memory-efficient because it only learns a fixed number of tokens, or prompt, for each new task. Learning a prompt to adapt language models on a single downstream task was introduced in prompt tuning (Lester et al., 2021) , and was shown to match the performance of full model finetuning while training <0.01% of the parameters. In Progressive Prompts, we learn a separate prompt for each incoming task and sequentially concatenate it with previously learned prompts. Importantly, we share input tokens across all tasks and progressively prepend new prompts while keeping previous prompts frozen (see Figure 1 ). Our method can: 1) alleviate catastrophic

