ZERO-SHOT RETRIEVAL WITH SEARCH AGENTS AND HYBRID ENVIRONMENTS

Abstract

Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zeroshot and in-domain evaluations, via interpretable actions, and at twice the speed.

1. INTRODUCTION

Transformer-based dual encoders for retrieval, and cross encoders for ranking (cf. e.g., Karpukhin et al. (2020) ; Nogueira & Cho (2019)), have redefined the architecture of choice for information search systems. However, sparse term-based inverted index architectures still hold their ground, especially in out-of-domain, or zero-shot, evaluations. On the one hand, neural encoders are prone to overfitting on training artifacts (Lewis et al., 2021) . On the other, sparse methods such as BM25 (Robertson & Zaragoza, 2009) may implicitly benefit from term-overlap bias in common datasets (Ren et al., 2022) . Recent work has explored various forms of dense-sparse hybrid combinations, to strike better variance-bias tradeoffs (Khattab & Zaharia, 2020; Formal et al., 2021b; Chen et al., 2021; 2022) . 2022) evaluate a simple hybrid design which takes out the dual encoder altogether and simply applies a cross encoder reranker to the top documents retrieved by BM25. This solution couples the better generalization properties of BM25 and high-capacity cross encoders, setting the current SOTA on BEIR by reranking 1000 documents. However, this is not very practical as reranking is computationally expensive. More fundamentally, it is not easy to get insights on why results are reranked the way they are. Thus, the implicit opacity of neural systems is not addressed.

Rosa et al. (

We propose a novel hybrid design based on the Learning to Search (L2S) framework (Adolphs et al., 2022) . In L2S the goal is to learn a search agent that autonomously interacts with the retrieval environment to improve results. By iteratively leveraging pseudo relevance feedback (Rocchio, 1971) , and language models' understanding, search agents engage in a goal-oriented traversal of the answer space, which aspires to model the ability to 'rabbit hole' of human searchers (Russell, 2019) . The framework is also appealing because of the interpretability of the agent's actions. Adolphs et al. (2022) show that search agents based on large language models can learn effective symbolic search policies, in a sparse retrieval environment, but fail to outperform neural retrievers. We extend L2S to a dense-sparse hybrid agent-environment framework structured as follows. The environment relies on both a state-of-the-art dual encoder, GTR (Ni et al., 2021) , and BM25 which separately access the document collection. Results are combined and sorted by means of a transformer cross encoder reranker (Jagerman et al., 2022) . We call this a Hybrid Retrieval Environment (HRE). Our search agent (HARE) interacts with HRE by iteratively refining the query via search operators, and aggregating the best results. HARE matches state-of-the-art results on the

