COMPOSITIONAL TASK REPRESENTATIONS FOR LARGE LANGUAGE MODELS

Abstract

Large language models have shown a remarkable cross-task generalization ability. Most prior works assumed that prompts effectively extract knowledge from language models to facilitate generalization to new tasks. This perspective led to numerous studies on improving prompts. In contrast, we introduce a new perspective, compositional generalization, that views each task as a composition of latent codes and generalizes to test tasks by a new composition of seen codes. To this end, we propose a novel prompt-free approach, Compositional Task Representations (CTR), that employs multi-task training to learn a discrete, compositional codebook. Empirically, our CTR substantially outperforms prompt-based methods in zero-label learning on average. According to our analysis, some of the learned CTR codes are interpretable to humans and demonstrate a certain degree of controllability.

1. INTRODUCTION

Large language models (LLMs) have shown remarkable performance in cross-task generalization. Without using any labeled data for the target task, GPT-3 (Brown et al., 2020) obtains reasonable performance on a wide range of tasks. Later extensions such as FLAN (Wei et al., 2022) and T0 (Sanh et al., 2022) continue training the LLMs on a large number of supervised tasks, which further improves cross-task generalization performance. The aforementioned studies have used an important assumption that natural language prompts extract knowledge from LLMs to facilitate generalization to new tasks. In this direction, numerous studies have focused on different aspects of improving prompt-based learning, such as designing better prompts (Xu et al., 2022) , increasing the number of prompts (Wang et al., 2022; Aribandi et al., 2022) , and improving the training efficiency of prompts (Lester et al., 2021) . In contrast, we explore an alternative perspective for cross-task generalization, i.e., compositional generalization. Specifically, we explore whether it is possible to represent tasks using discrete compositions of latent codes. This perspective enjoys several potential benefits. First, since the latent codes have been trained for seen tasks, we expect the LLMs to have strong cross-task generalization abilities because new tasks can also be represented as a composition of these trained codes. Second, it provides a way to analyze and understand cross-task generalization by investigating the association between tasks and the learned representations. Third, it has the potential of being more controllable than prompts for task generalization due to its built-in compositionality. Motivated by the aforementioned potentials, we propose a new method, Compositional Task Representations (CTR), that employs multi-task training to learn a discrete, compositional codebook. Specifically, given a large number of training tasks, we use an encoder to map each randomlyinitialized task embedding to a fixed-length sequence of query vectors. Each query vector is used to retrieve a code from a codebook, which is formulated as an embedding lookup table. This produces

