HYPER: MULTITASK HYPER-PROMPTED TRAINING EN-ABLES LARGE-SCALE RETRIEVAL GENERALIZATION

Abstract

Recently, large-scale text retrieval has made impressive progress, facilitating both information retrieval and downstream knowledge-intensive tasks (e.g., opendomain QA and dialogue). With a moderate amount of data, a neural text retriever can outperform traditional methods such as BM25 by a large step. However, while being applied to out-of-domain data 1 , the performance of a neural retriever degrades considerably. Therefore, how to enable a retriever to perform more robustly across different domains or tasks and even show strong zero-shot transfer ability is critical for building scalable IR systems. To this end, we propose HYPER, a hyper-prompted training mechanism to enable uniform retrieval across tasks of different domains. Specifically, our approach jointly trains the query encoder with a shared prompt-based parameter pool and a prompt synthesizer that dynamically composes hyper-prompt for encoding each query from different tasks or domains. Besides, to avoid the mode collapse of prompt attention distribution for different queries, we design a contrastive prompt regularization that promotes the mode of prompt attention to be aligned and uniform. Through multi-task hyper-prompted training, our retriever can master the ability to dynamically represent different types of queries and transfer knowledge across different domains and tasks. Extensive experiments show our model attains better retrieval performance across different tasks and better zero-shot transfer ability compared with various previous methods.

1. INTRODUCTION

Large-scale retrieval aims to retrieve relevant documents from millions to billions of documents according to a given query, which is the so-called first stage retrieval (Cai et al., 2021) . It can benefit for resolving various knowledge-intensive tasks significantly (Guu et al., 2020; Lewis et al., 2020) , since the retrieved relevant documents contain explicit knowledge of world (Petroni et al., 2021) . Traditional term-matching methods including tf-idf and BM25 (Yang et al., 2017) can effectively achieve retrieval by building an inverted index and perform fairly well regardless of domains, however, recent popular neural retrievers outperform them by a large step with a moderate amount of task-specific data (Karpukhin et al., 2020; Formal et al., 2021b; Khattab & Zaharia, 2020) . For neural retrieval, a common way is to use pre-trained language models (e.g., BERT) (Devlin et al., 2019) to encode queries and documents into vectors respectively, which is known as Bi-Encoder. Although neural retrievers can be optimized effectively by utilizing the samples of specific tasks, in real-world applications, the formats of queries are different and the expected priorities of query vectors are varying considerably from task to task. For example, in Naturals Questions dataset (Kwiatkowski et al., 2019) , a query such as "what was the first capital city of Australia" is a simple question sentence, however, in Wizard of Wikipedia dataset (Dinan et al., 2018) , a query such as "...Snoop Dogg is so awesome, he's a great rapper and does a lot for his community as well..." contain multiple declarative sentences with implicit retrieval target. Besides the difference in query formats, different tasks also require generating query vectors with different richness or intents, in HotpotQA dataset (Yang et al., 2018) an input query "which game was published first, Near and Far or King of Tokyo?" expects an input query that can retriever documents relevant to the two mentioned items which are fair different from the queries in Natural Question that require retrieving specific facts to only one item. Those differences between tasks cause significant performance degradation when a model is applied to different tasks. Moreover, there is also a data sparse problem for recently popular tasks (Almeida & Matos, 2020), which expects a better generalization of a neural retriever (Thakur et al., 2021) . To resolve the above challenges, we aim to build a universal model that is capable to process queries uniformly regardless of the differences between different tasks including varying formats of input queries and the unique features of query vectors for specific tasks. Meanwhile, we expect our model can obtain stronger generalization abilities which can be reflected by promising zero-shot and few-shot performance in large-scale retrieval. Specifically, the first problem is how to enable a universal query process. For a neural retriever, the ability to resolve a specific task means a set of parameters trained on this task. Although one can train different models for each tasks (Karpukhin et al., 2020) or simply use a shared encoder with multi-tasking setting (Maillard et al., 2021) , the first method leads to heavy parameter cost while the second method results in potentially indifferent generalization abilities. To this end, we propose HYPER, a multi-task HYPEr-prompted training mechanism that can be combined with any transformer-based neural Retrieves. HYPER consists of two key components. The first component is Query-conditional Prompt Synthesizer (QPS) that leverages the attention module to synthesize suitable parameters of query encoder for different queries, which enables our query encoder to master the ability to dynamically represent different types of queries and transfer learned parameters across different tasks and domains by multi-task training. Nevertheless, we find merely applying QPS results in a mode collapse problem of attention scores distributions, which causes our query encoder fails to learn different abilities to process queries for different tasks. To deal with this problem, we propose the Contrastive Prompt Regularization (CPR) to encourage the parameter synthesizing of the same tasks to become similar for better training effectiveness while promoting our query encoder to distinguish queries of different tasks and thus avoid mode collapse problems. Through the above multi-task hyper-prompted training, our HYPER can master the ability to dynamically represent different types of queries and transfer knowledge across different domains and tasks. Therefore, HYPER can enable large-scale retrieval generalization in the zero-shot and few-shot scenarios. To conclude, our contributions are three-fold as follows, i) we present HYPER, a multitask hyperprompted training mechanism that enables a neural retriever to dynamically process different types of queries with different hyper-prompts and transfer learned knowledge across different domains and tasks. ii) to impede the uniform retrieval in model construction and optimization, we propose Query-conditional Prompt Synthesizer (QPS) along with Contrastive Prompt Regularization (CPR) to synthesize suitable prompts for different queries. iii) Experiments in zero-shot in-domain and cross-domain retrieval tasks reflect the superior generalization provided by HYPER and the strong multi-tasking performance indicates the achieving of uniform retrieval.

2. METHOD

Task Formation For the large-scale text retrieval, we aim to seek document d + containing relevant knowledge from a large collection of documents D to answer the query q. Although input queries vary from task to task, we propose employing only one retriever to process them uniformly. Specifically, for datasets C = {T 1 , T 2 , . . . , T t } and out-of-domain data C = {T t+1 , T t+2 , . . . , T t+k },where t,k is the number of tasks with training samples and without training samples respectively, the goal is to learn a neural retriever model P (d + |q, D; θ) (θ denotes the parameters of the model) with C and perform well on these in-domain tasks, while transferring the learned knowledge to process a new query q from out-of-domain datasets C. Thus, given any queries in C ∪ C, one can find the proper knowledge documents d + following P (d + |q, D; θ). Model Overview Building upon a pre-trained neural retriever, HYPER aims to dynamically synthesize suitable prefixes to enable the retriever to process different queries uniformly and an illustration



Noting that here out-of-domain data refers to different tasks with different types of queries or the same task with data from different domains.

