ANSWERING COMPLEX OPEN-DOMAIN QUESTIONS WITH MULTI-HOP DENSE RETRIEVAL

Abstract

We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.

1. INTRODUCTION

Open domain question answering is a challenging task where the answer to a given question needs to be extracted from a large pool of documents. The prevailing approach (Chen et al., 2017) tackles the problem in two stages. Given a question, a retriever first produces a list of k candidate documents, and a reader then extracts the answer from this set. Until recently, retrieval models were dependent on traditional term-based information retrieval (IR) methods, which fail to capture the semantics of the question beyond lexical matching and remain a major performance bottleneck for the task. Recent work on dense retrieval methods instead uses pretrained encoders to cast the question and documents into dense representations in a vector space and relies on fast maximum inner-product search (MIPS) to complete the retrieval. These approaches (Lee et al., 2019; Guu et al., 2020; Karpukhin et al., 2020) have demonstrated significant retrieval improvements over traditional IR baselines. However, such methods remain limited to simple questions, where the answer to the question is explicit in a single piece of text evidence. In contrast, complex questions typically involve aggregating information from multiple documents, requiring logical reasoning or sequential (multihop) processing in order to infer the answer (see Figure 1 for an example). Since the process for answering such questions might be sequential in nature, single-shot approaches to retrieval are insufficient. Instead, iterative methods are needed to recursively retrieve new information at each step, conditioned on the information already at hand. Beyond further expanding the scope of existing textual open-domain QA systems, answering more complex questions usually involves multi-hop reasoning, which poses unique challenges for existing neural-based AI systems. With its practical Q: What was the nickname of Judy Lewis's father？ P1: Judy Lewis (born Judith Young; November 6, 1935 -November 25, 2011) was an American actress, writer, producer, and therapist. She was the secret biological daughter of actor Clark Gable and actress Loretta Young. P1: William Clark Gable (February 1, 1901 -November 16, 1960) was an American film actor, often referred to as "The King of Hollywood". He had roles in more than 60 motion pictures in a wide variety of genres during a career that lasted 37 years... et al., 2019; Asai et al., 2020) . The problem then becomes finding the best path in this graph, where the search space is bounded by the number of hyperlinks in each passage. However, such methods may not generalize to new domains, where entity linking might perform poorly, or where hyperlinks might not be as abundant as in Wikipedia. Moreover, efficiency remains a challenge despite using these data-dependent pruning heuristics, with the best model (Asai et al., 2020) needing hundreds of calls to large pretrained models to produce a single answer.

Dense Corpus

In contrast, we propose to employ dense retrieval to the multi-hop setting with a simple recursive framework. Our method iteratively encodes the question and previously retrieved documents as a query vector and retrieves the next relevant documents using efficient MIPS methods. With highquality, dense representations derived from strong pretrained encoders, our work first demonstrates that the sequence of documents that provide sufficient information to answer the multi-hop question can be accurately discovered from unstructured text, without the help of corpus-specific hyperlinks. When evaluated on two multi-hop benchmarks, HotpotQA (Yang et al., 2018) and a multi-evidence subset of FEVER (Thorne et al., 2018) , our approach improves greatly over the traditional linkingbased retrieval methods. More importantly, the better retrieval results also lead to state-of-the-art downstream results on both datasets. On HotpotQA, we demonstrate a vastly improved efficiencyaccuracy trade-off achieved by our system: by limiting the amount of retrieved contexts fed into downstream models, our system can match the best published result while being 10x faster.

2. METHOD

2.1 PROBLEM DEFINITION The retrieval task considered in this work can be described as follows (see also Figure 1 ). Given a multi-hop question q and a large text corpus C, the retrieval module needs to retrieve a sequence of passages P seq : {p 1 , p 2 , ..., p n } that provide sufficient information for answering q. Practically, the retriever returns the k best-scoring sequence candidates, {P 1 seq , P 2 seq , ..., P k seq } (k |C|), with the hope that at least one of them has the desired qualities. k should be small enough for downstream modules to process in a reasonable time while maintaining adequate recall. In general, retrieval also needs to be efficient enough to handle real-world corpora containing millions of documents.

2.2. MULTI-HOP DENSE RETRIEVAL

Model Based on the sequential nature of the multi-hop retrieval problem, our system solves it in an iterative fashion. We model the probability of selecting a certain passage sequence as follows: P (P seq |q) = n t=1 P (p t |q, p 1 , ..., p t-1 ),



Figure 1: An overview of the multi-hop dense retrieval approach.and research values, multi-hop QA has been extensively studied recently(Talmor & Berant, 2018;  Yang et al., 2018; Welbl et al., 2018)  and remains an active research area in NLP(Qi et al., 2019; Nie  et al., 2019; Min et al., 2019; Zhao et al., 2020; Asai et al., 2020; Perez et al., 2020).The main problem in answering multi-hop open-domain questions is that the search space grows exponentially with each retrieval hop. Most recent work tackles this issue by constructing a document graph utilizing either entity linking or existing hyperlink structure in the underlying Wikipedia corpus (Nie

