DECAF: JOINT DECODING OF ANSWERS AND LOGICAL FORMS FOR QUESTION ANSWERING OVER KNOWLEDGE BASES

Abstract

Question answering over knowledge bases (KBs) aims to answer natural language questions with factual information such as entities and relations in KBs. Previous methods either generate logical forms that can be executed over KBs to obtain final answers or predict answers directly. Empirical results show that the former often produces more accurate answers, but it suffers from non-execution issues due to potential syntactic and semantic errors in the generated logical forms. In this work, we propose a novel framework DECAF that jointly generates both logical forms and direct answers, and then combines the merits of them to get the final answers. Moreover, different from most of the previous methods, DECAF is based on simple free-text retrieval without relying on any entity linking tools -this simplification eases its adaptation to different datasets. DECAF achieves new stateof-the-art accuracy on WebQSP, FreebaseQA, and GrailQA benchmarks, while getting competitive results on the ComplexWebQuestions benchmark. 1

1. INTRODUCTION

Knowledge Bases Question Answering (KBQA) aims to answer natural language questions based on knowledge from KBs such as DBpedia (Auer et al., 2007) , Freebase (Bollacker et al., 2008) or Wikidata (Vrandečić & Krötzsch, 2014) . Existing methods can be divided into two categories. One category is based on semantic parsing, where models first parse the input question into a logical form (e.g., SPARQL (hommeaux, 2011 ) or S-expression (Gu et al., 2021) ) then execute the logical form against knowledge bases to obtain the final answers (Das et al., 2021; Gu et al., 2021; Ye et al., 2022) . The other category of methods directly outputs answers without relying on the the logicalform executor (Lan et al., 2019; Sun et al., 2019; Saxena et al., 2022; Oguz et al., 2022) . They either classify the entities in KB to decide which are the answers (Sun et al., 2019) or generate the answers using a sequence-to-sequence framework (Saxena et al., 2022; Oguz et al., 2022) . Previous empirical results (Ye et al., 2022; Das et al., 2021; Gu et al., 2022) show that the semantic parsing based methods can produce more accurate answers over benchmark datasets. However, due to the syntax and semantic restrictions, the output logical forms can often be non-executable and thus would not produce any answers. On the other hand, direct-answer-prediction methods can guarantee to generate output answers, albeit their answer accuracy is usually not as good as semantic parsing based methods, especially over complex questions which require multi-hop reasoning (Talmor & Berant, 2018) . To our knowledge, none of the previous studies have leveraged the advantages of both types of methods. Moreover, since knowledge bases are usually large-scale with millions of entities, most previous methods rely on entity linking to select relevant information from KB for answering questions. However, these entity linking methods are usually designed for specific datasets, which inevitably limits the generalization ability of these methods. In this paper, we propose a novel framework DECAF to overcome these limitations: (1) Instead of relying on only either logical forms or direct answers, DECAF jointly decodes them together, and further combines the answers executed using logical forms and directly generated ones to obtain

funding

* Work done during internship at AWS AI Labs

