MACHINE READING COMPREHENSION WITH EN-HANCED LINGUISTIC VERIFIERS

Abstract

We propose two linguistic verifiers for span-extraction style machine reading comprehension to respectively tackle two challenges: how to evaluate the syntactic completeness of predicted answers and how to utilize the rich context of long documents. Our first verifier rewrites a question through replacing its interrogatives by the predicted answer phrases and then builds a cross-attention scorer between the rewritten question and the segment, so that the answer candidates are scored in a position-sensitive context. Our second verifier builds a hierarchical attention network to represent segments in a passage where neighbour segments in long passages are recurrently connected and can contribute to current segment-question pair's inference for answerablility classification and boundary determination. We then combine these two verifiers together into a pipeline and apply it to SQuAD2.0, NewsQA and TriviaQA benchmark sets. Our pipeline achieves significantly better improvements of both exact matching and F1 scores than state-of-the-art baselines.

1. INTRODUCTION

Teaching a machine to read and comprehend large-scale textual documents is a promising and longstanding goal of natural language understanding. This field, so called machine reading comprehension (MRC) (Zhang et al., 2019; 2020c) , has achieved impressive milestones in recent years thanks to the releasing of large-scale benchmark datasets and pretrained contextualized language models (CLM). For example, for the well-testified span-extraction style SQuAD2.0 dataset 1 , current best results under the framework of pretraining+fine-tuning employing ALBERT (Lan et al., 2020) are 90.7% of exact matching (EM) and 93.0% of F1 score, exceeds their human-level scores of 86.8% and 89.5% (Rajpurkar et al., 2016; 2018) in a large margin. MRC is traditionally defined to be a question-answering task which outputs answers by given inputs of passage-question pairs. Considering the types of answers, Chen (2018) classified MRC's tasks into four categories: cloze-filling of a question with gaps (Ghaeini et al., 2018) , multiple-choice from several options (Zhang et al., 2020a) , span extraction of answer from the passage (Rajpurkar et al., 2016; 2018; Trischler et al., 2017) and free-style answer generation and summarization from the passage (Nguyen et al., 2016) . MRC is regarded to be widely applicable to numerous applications that are rich of question-style queries, such as information retrieval and task-oriented conversations. For detailed survey of this field, please refer to (Zhang et al., 2020c) for recent research roadmap, datasets and future directions. In this paper, we focus on span-extraction style MRC with unanswerable questions. Rajpurkar et al. (2018) introduced 50K+ unanswerable questions to construct the SQuAD2.0 dataset. Unanswerable questions include rewriting originally answerable questions through ways of negation word inserted or removed, antonym used, entity swap, mutual exclusion, and impossible condition. Plausible answers which correspond to spans in the given passage are attached to these unanswerable questions. Numerous verifiers have been proposed to score the answerability of questions. For example, Hu et al. ( 2019) proposed a read-then-verify system that explicitly verified the legitimacy of the predicted answer. An answer verifier was designed to decide whether or not the predicted answer is entailable by the input snippets (i.e., segments of the input passage). Their system achieved a F1 score of 74.8% and 74.2% respectively on the SQuAD2.0's dev and test sets. Zhang et al. (2020b) proposed a 1 https://rajpurkar.github.io/SQuAD-explorer/ 1

