MULTI-SPAN QUESTION ANSWERING USING SPAN-IMAGE NETWORK

Abstract

Question-answering (QA) models aim to find an answer given a question and context. Language models like BERT are used to associate question and context to find an answer span. Prior art on QA focuses on finding the best answer. There is a need for multi-span QA models to output the top-K likely answers to questions such as "Which companies Elon Musk started?" or "What factors cause global warming?" In this work, we introduce Span-Image architecture that can learn to identify multiple answers in a context for a given question. This architecture can incorporate prior information about the span length distribution or valid span patterns (e.g., end index has to be larger than start index), thus eliminating the need for post-processing. Span-Image architecture outperforms the state-of-the-art in top-K answer accuracy on SQuAD dataset and in multi-span answer accuracy on an Amazon internal dataset.

1. INTRODUCTION

Answering questions posted as text to search engines or spoken to virtual assistants like Alexa has become a key feature in information retrieval systems. Publicly available reading comprehension datasets including WikiQA (Yang et al., 2015 ), TriviaQA (Joshi et al., 2017 ), NewsQA (Trischler et al., 2016 ), and SQuAD (Rajpurkar et al., 2016) have fostered research in QA models. SQuAD is one of the most widely-used reading comprehension benchmarks that has an active leaderboard with many participants. Even though there are models that beat human-level accuracy in SQuAD, these QA systems can do well by learning only context and type-matching heuristics (Weissenborn et al., 2017) but may still be far from true language understanding since they do not offer robustness to adversarial sentences (Jia & Liang, 2017) . To better measure performance, SQuAD v2.0 Rajpurkar et al. ( 2018) extends v1.1 by allowing questions that have no explicit answers in a given paragraph. QA can be modeled as a task to predict the span (i.e., start and end indices) of an answer given a question and an input paragraph. To find the answer span, language representation models such as BERT can be used to associate a question with a given paragraph Devlin et al. (2019) . BERT is pre-trained on unsupervised tasks using large corpora. Its input representation permits a pair, which is well suited for having a question and a passage as input. By fine-tuning BERT on SQuAD, a QA model can be obtained. Questions without an answer are treated as having a span that begins and ends with the special BERT token: [CLS] . In this way, a BERT-based QA model can offer an actual answer or 'no-answer" to all questions in SQuAD v1.1 and v2.0 datasets. Prior work on QA assumes presence of a single answer or lack of any answer Seo et al. (2016 ), Devlin et al. (2019) . Furthermore, they assume a separable probability distribution function (pdf ) for start and end indices of an answer span, which leads to a separable loss function. This approach has two major disadvantages: 1) It prevents the QA model from predicting multiple spans without postprocessing. 2) Since a separable pdf is used, the QA model can not learn to evaluate compatibility of start and end indices, thus suffering from performance degradation. Pang et al. (2019) consider a hierarchical answer span by sorting the product of start and end probabilities to support multiple spans. However, they still assume a separable pdf for start and end indices. To the best of our knowledge, a multi-span QA architecture has not been proposed. We introduce Span-Image architecture to enable multi-span answers (or multiple answers) given a question and a paragraph. Each pixel (i,j) in the span-image corresponds to a span starting at ith position and ending at jth. Typical image processing networks like 2D convolutional network layers

