UNLEASH MODEL CAPACITY FOR UNIVERSAL DENSE RETRIEVAL BY TASK SPECIALTY OPTIMIZATION

Abstract

Universal dense retrieval, with one unified representation space to empower various retrieval scenarios, has many appealing advantages in simplicity, efficiency, and potential to break echo chambers with cross-scenario information access. However, standard multi-task trained dense retrievers often fail to meet the accuracy of scenario-specific models. In this paper, we analyze the multi-task learning in universal retrieval and show that the model capacity is not the main bottleneck. It is the optimization failed to fully utilize the network parameters to capture task-specific signals. This motivated our development of TACO-DR, which conducts multi-task learning for universal retrieval with TAsk speCialty Optimization. TACO-DR dynamically adjusts the learning rate for each parameter regrading each task based on its task-specific sensitivity, to encourage parameters to better capture task specific signals. On the KILT benchmark, TACO-DR outperforms various multi-task learning methods and achieves better overall accuracy than single-task models. Our analysis shows that TACO-DR better utilizes the model capacity with more task-specific parameters. Our code and model checkpoints will be open-sourced.

1. INTRODUCTION

With pretrained language models (Lee et al., 2019) and dedicated training strategies (Karpukhin et al., 2020a; Xiong et al., 2021) , dense retrieval systems now effectively learn a dense representation space that matches queries and relevant documents in nearest neighborhoods. This representation-based retrieval approach provides strong empirical benefits in various scenarios with retrieval as the end goal (Bajaj et al., 2016) and as the first stage retrieval of many language systems (Lewis et al., 2020) . A promising potential of dense retrieval is to unify various scenarios via one representation space, that unifies the representation and match of different types of information, e.g., text and image (Liu et al., 2022) , and different types of queries, e.g., keywords, questions, and conversations (Petroni et al., 2021) . Such universal retrieval (Maillard et al., 2021) leads to instant efficiency benefits, as one document index can support multiple scenarios. It also helps break information boundaries between scenarios with one unified entrance for all user information needs. Ideally, universal retrieval on multiple scenarios would lead to more accurate retrieval than single scenario systems, with the advantage of multi-task learning. However, recent research observed ambivalent empirical performance of universal retrieval, especially when capturing a large number of retrieval tasks in one universal system (Maillard et al., 2021) . This clouds the promise of universal retrieval as its becomes a trade-off between efficiency and accuracy. In this paper, we conduct thorough investigation on the challenges of multi-task learning in universal retrieval. We performed analysis on the KILT benchmark and found that several state-of-the-art retrieval systems indicate that the network capacity is not yet the main limiting factor for universal retrieval accuracy. Though the multi-task learning has guided the parameters to capture task specific or shared signals, the optimization is not sufficient, resulting in a large fraction of parameters that are not well utilized to capture task-specific signals, as reflected by their low sensitivity (Liang et al., 2022) to each task. Motivated by our observations, we develop TACO-DR, "TAsk speCific Optimized universal Dense Retriever", that improves universal retrieval by optimizing the task-specialty of neural parameters during multi-task training. TACO-DR first utilizes task identifier prompts in its query encoder to improve the model's task awareness without compromising on the universality of the representation space. Then TACO-DR introduces task sensitivity guided optimization (T-SAGE), which dynamically adjusts the gradient step size of each parameter according to its sensitivity to different tasks, to encourage parameters to capture more task specific signals. To demonstrate the advantages of TACO-DR, we conduct experiments on the eight retrieval tasks included in the standard KILT benchmark, including scenarios such as fact check, entity linking, slot filling, OpenQA, and dialog (Petroni et al., 2021) . While standard multi-task dense retrieval failed to outperform their single task counter parts, TACO-DR outperforms both single task models as well as other more advanced multi-task learning techniques. Our ablation confirms the source of effectiveness from its task identifier prompts and task-sensitivity guided optimization. Our further analysis reveals how TACO-DR better leverages the model parameters for universal retrieval. Compared to standard multi-task learning, TACO-DR activates a larger fraction of the model parameters for each single task. It effectively encourages more model parameters to capture task specific signals, which is achieved by enforcing the parameters to continuously focus on tasks that they are initially learning during optimization. As a result, TACO-DR better utilizes the model parameters to capture the various training signals from multiple tasks and achieve strong retrieval accuracy with one universal retrieval system for many scenarios.

2. RELATED WORK

Learning text representations other than discrete bag-of-words has been a long desired goal in information retrieval (Deerwester et al., 1990; Huang et al., 2013) . With the benefits of pretrained language models (Kenton & Toutanova, 2019; Lee et al., 2019) and effective hard negative sampling in finetuning (Karpukhin et al., 2020a; Xiong et al., 2021) , dense retrieval systems have shown strong effectiveness on many retrieval scenarios (Qu et al., 2020; Herzig et al., 2021; Chang et al., 2021) . Providing a centralized entry to multiple information sources has various advantages. It is more convenient for the user, reduces the information barrier between scenarios, and provides more diverse information access. Previously, this was achieved by complex divide-and-conquer systems, e.g., in federated search (Arguello et al., 2009) . Universal retrieval provides a simpler solution with one unified representation space empowering multiple scenarios (Sciavolino, 2021) , by embedding all data formats using neural encoders (Baevski et al., 2022) . Recent research on universal retrieval often leverages multi-task learning to train one dual-encoder model for multiple retrieval tasks. Karpukhin et al. (2020a) train a DPR model on four OpenQA tasks and observed accuracy gains on some tasks. Liu et al. (2022) shows the effectiveness of one unified embedding space to retrieve both texts and images. When the number of retrieval tasks grows, multi-task learning becomes challenging. Multi-task trained DPR on eight KILT tasks (Petroni et al., 2021) does not provide much accuracy improvements over single task models (Maillard et al., 2021) . Autoregressive retrieval, which directly generates document identifiers using query as input, is another potential solution (De Cao et al., 2020) , but currently it mainly thrives on scenarios where the target corpus is small (Bevilacqua et al., 2022) or natural document identifiers exist (e.g., entity names (Chen et al., 2022)). Recent research has observed various optimization challenges in multi-task learning with modern deep learning systems (Ruder, 2017), for example, gradient conflicts, task imbalance, high variance of gradient magnitudes, to name a few (Yu et al., 2020) . Various techniques have been developed. A common way to address these challenges is to perform an operation on the gradients during scholastic updates, e.g., by projecting gradients from different tasks to their norms (Yu et al., 2020) , modifying their directions and magnitudes (Wang et al., 2020) , and encouraging updates toward common directions (Piratla et al.) .

3. PRELIMINARIES

In this section we recap preliminaries of dense retrieval and the constraints of universal retrieval. Dense Retrieval. Given a query q and a corpus C, the retrieval task is to find a set of relevant documents d ∈ C for q, often by using a retrieval function f (q, d; θ). Dense retrieval (DR) systems

