UNIKGQA: UNIFIED RETRIEVAL AND REASONING FOR SOLVING MULTI-HOP QUESTION ANSWERING OVER KNOWLEDGE GRAPH

Abstract

Multi-hop Question Answering over Knowledge Graph (KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG). To cope with the vast search space, existing work usually adopts a two-stage approach: it first retrieves a relatively small subgraph related to the question and then performs the reasoning on the subgraph to find the answer entities accurately. Although these two stages are highly related, previous work employs very different technical solutions for developing the retrieval and reasoning models, neglecting their relatedness in task essence. In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. For model architecture, UniKGQA consists of a semantic matching module based on a pre-trained language model (PLM) for question-relation semantic matching, and a matching information propagation module to propagate the matching information along the directed edges on KGs. For parameter learning, we design a shared pre-training task based on questionrelation matching for both retrieval and reasoning models, and then propose retrieval-and reasoning-oriented fine-tuning strategies. Compared with previous studies, our approach is more unified, tightly relating the retrieval and reasoning stages. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our method on the multi-hop KGQA task.

1. INTRODUCTION

With the availability of large-scale knowledge graphs (KGs), such as Freebase (Bollacker et al., 2008) and Wikidata (Tanon et al., 2016) , knowledge graph question answering (KGQA) has become an important research topic that aims to find the answer entities of natural language questions from KGs. Recent studies (Lan et al., 2021) mainly focus on multi-hop KGQA, a more complex scenario where sophisticated multi-hop reasoning over edges (or relations) is required to infer the correct answer on the KG. We show an example in Figure 1(a) . Given the question "Who is the wife of the nominee for The Jeff Probst Show", the task goal is to find a reasoning path from the topic entity "The Jeff Probst Show" to the answer entities "Shelley Wright" and "Lisa Ann Russell". Faced with the vast search space in large-scale KGs, previous work (Sun et al., 2018; 2019) typically adopts a retrieval-then-reasoning approach, to achieve a good trade-off. Generally, the retrieval stage aims to extract relevant triples from the large-scale KG to compose a relatively smaller question-relevant subgraph, while the reasoning stage focuses on accurately finding the answer entities from the retrieved subgraph. Although the purposes of the two stages are different, both stages

availability

Our codes and data are publicly available at https://github.com/RUCAIBox/UniKGQA.

