AUGMENTING ZERO-SHOT DENSE RETRIEVERS WITH PLUG-IN MIXTURE-OF-MEMORIES Anonymous

Abstract

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time. We develop a joint learning mechanism that trains the augmentation component with latent labels derived from the end retrieval task, paired with hard negatives from the memory mixture. We instantiate the model in a zero-shot dense retrieval setting by augmenting a strong T5-based retriever with MoMA. Our model, MoMA-DR, obtains strong zero-shot retrieval accuracy on the eighteen tasks included in the standard BEIR benchmark. It outperforms other dense retrieval models of similar scale and achieves comparable accuracy with systems that seek generalization from increased scales in encoder models or vector indices. Our analysis illustrates the necessity of augmenting with mixture-of-memory for robust generalization, the benefits of joint learning, and how MoMA-DR utilizes the plug-in memory at inference time without changing its parameters. We plan to open source our code.

1. INTRODUCTION

Scaling up language models-with more parameters, compute, and annotation data-improves model generalization ability on downstream applications (Raffel et al., 2019; Brown et al., 2020; Smith et al., 2022) , but with diminishing return: linear improvements on downstream metrics often require exponentially more parameters and computing cost (Kaplan et al., 2020; Hoffmann et al., 2022) . Hence, scaling pretrained language models in this way is economically unsustainable (Strubell et al., 2020; Bender et al., 2021; Zhang et al., 2022) . Retrieval augmented language models provide a promising alternative. They allow language models to efficiently access vast resources from an external corpus (Guu et al., 2020; Borgeaud et al., 2022) that serves as a kind of "memory" they can refer to when making predictions, alleviating the need to memorize as much information in their own network parameters (Roberts et al., 2020) . This openbook approach helps language models to better generalize on token prediction tasks and machine translation (Khandelwal et al., 2019; Borgeaud et al., 2022) , and tasks which already involve a first-stage retrieval component, e.g., OpenQA (Borgeaud et al., 2022; Izacard et al., 2022) . In this paper we improve the zero-shot generalization ability of language models using "mixture-ofmemory" (MoMA), a new retrieval augmentation mechanism. Instead of a single corpus, MoMA retrieves documents from a "mixture" of multiple external corpora. This mechanism also allows removing and/or "plugging-in" new corpora during inference time, when more information from the target task is revealed, or as an additional way for users to control the model. It is not trivial to guide a retrieval model to leverage multiple corpora; we need to jointly train the augmentation component and dense retriever using supervised relevance signals and self-mined hard negatives. We instantiate MoMA with a T5 encoder-decoder model (Ni et al., 2022) and apply it to the dense retrieval task (Karpukhin et al., 2020) . Our resulting retrieval system, MoMA-DR, uses a set of augmenting documents from the mixture-of-memories to enhance its representation of the query with important context; the retriever then uses the enhanced query representation to retrieve a final candidate set. At inference time, we plug in the target task's corpus to the memory mixture to introduce in-domain context information, without updating any parameter.

