WEAKLY SUPERVISED NEURO-SYMBOLIC MODULE NETWORKS FOR NUMERICAL REASONING

Abstract

Neural Module Networks (NMNs) have been quite successful in incorporating explicit reasoning as learnable modules in various question answering tasks, including the most generic form of numerical reasoning over text in Machine Reading Comprehension (MRC). However, to achieve this, contemporary NMNs need strong supervision in executing the query as a specialized program over reasoning modules and fail to generalize to more open-ended settings without such supervision. Hence we propose Weakly-Supervised Neuro-Symbolic Module Network (WNSMN) trained with answers as the sole supervision for numerical reasoning based MRC. It learns to execute a noisy heuristic program obtained from the dependency parsing of the query, as discrete actions over both neural and symbolic reasoning modules and trains it end-to-end in a reinforcement learning framework with discrete reward from answer matching. On the numerical-answer subset of DROP, WNSMN outperforms NMN by 32% and the reasoning-free language model GenBERT by 8% in exact match accuracy when trained under comparable weak supervised settings. This showcases the effectiveness and generalizability of modular networks that can handle explicit discrete reasoning over noisy programs in an end-to-end manner.

1. INTRODUCTION

End-to-end neural models have proven to be powerful tools for an expansive set of language and vision problems by effectively emulating the input-output behavior. However, many real problems like Question Answering (QA) or Dialog need more interpretable models that can incorporate explicit reasoning in the inference. In this work, we focus on the most generic form of numerical reasoning over text, encompassed by the reasoning-based MRC framework. A particularly challenging setting for this task is where the answers are numerical in nature as in the popular MRC dataset, DROP (Dua et al., 2019) . Figure 1 shows the intricacies involved in the task, (i) passage and query language understanding, (ii) contextual understanding of the passage date and numbers, and (iii) application of quantitative reasoning (e.g., max, not) over dates and numbers to reach the final numerical answer. Three broad genres of models have proven successful on the DROP numerical reasoning task. First, large-scale pretrained language models like GenBERT (Geva et al., 2020) uses a monolithic Transformer architecture and decodes numerical answers digit-by-digit. Though they deliver mediocre performance when trained only on the target data, their competency is derived from pretraining on massive synthetic data augmented with explicit supervision of the gold numerical reasoning. Second kind of models are the reasoning-free hybrid models like NumNet (Ran et al., 2019 ), NAQANet (Dua et al., 2019) , NABERT+ (Kinley & Lin, 2019) and MTMSN (Hu et al., 2019 ), NeRd (Chen et al., 2020) . They explicitly incorporate numerical computations in the standard extractive QA pipeline by learning a multi-type answer predictor over different reasoning types (e.g., max/min, diff/sum, count, negate) and directly predicting the corresponding numerical expression, instead of learning to reason. This is facilitated by exhaustively precomputing all possible outcomes of discrete operations and augmenting the training data with the reasoning-type supervision and numerical expressions that lead to the correct answer. Lastly, the most relevant class of models to consider for this work are the modular networks for reasoning. Neural Module Networks (NMN) (Gupta et al., 2020) is the first explicit reasoning based QA model which parses the query into a specialized program and executes it step-wise over learnable reasoning modules. However, to do so, apart from the exhaustive precomputation of all discrete operations, it also needs more fine-grained supervision of the gold While being more pragmatic and richer at interpretability, both modular and hybrid networks are also tightly coupled with the additional supervision. For instance, the hybrid models cannot learn without it, and while NMN is the first to enable learning from QA pair alone, it still needs more finer-grained supervision for at least a part of the training data. With this, it manages to supercede the SoTA models NABERT and MTMSN on a carefully chosen subset of DROP using the supervision. However, NMN generalizes poorly to more open-ended settings where such supervision is not easy to handcraft. Need for symbolic reasoning. One striking characteristic of the modular methods is to avoid discrete reasoning by employing only learnable modules with an exhaustively precomputed space of outputs. While they perform well on DROP, their modeling complexity grows arbitrarily with more complex non-linear numerical operations (e.g., exp, log, cos). Contrarily, symbolic modular networks that execute the discrete operations are possibly more robust or pragmatic in this respect by remaining unaffected by the operation complexity. Such discrete reasoning has indeed been incorporated for simpler, well-structured tasks like math word problems (Koncel-Kedziorski et al., 2016) or KB/Table-QA (Zhong et al., 2017; Liang et al., 2018; Saha et al., 2019) , with Deep Reinforcement Learning (RL) for end-to-end training. MRC however needs a more generalized framework of modular neural networks involving more fuzzy reasoning over noisy entities extracted from open-ended passages. In view of this, we propose a Weakly-Supervised Neuro-Symbolic Module Network (WNSMN) • A first attempt at numerical reasoning based MRC, trained with answers as the sole supervision; • Based on a generalized framework of dependency parsing of queries into noisy heuristic programs; • End-to-end training of neuro-symbolic reasoning modules in a RL framework with discrete rewards; To concretely compare WNSMN with contemporary NMN, consider the example in Figure 1 . In comparison to our generalized query-parsing, NMN parses the query into a program form (MAX(FILTER(FIND('Carpenter'), 'goal')), which is step-wise executed by different learnable modules with exhaustively precomputed output set. To train the network, it employs various forms of strong supervision such as gold program operations and gold query-span attention at each step of the program and gold execution i.e., supervision of the passage numbers (23, 26, 42) to execute MAX operation on. While NMN can only handle the 6 reasoning categories that the supervision was tailored to, WNSMN focuses on the full DROP with numerical answers (called DROP-num) that involves more diverse reasoning on more open-ended questions. We empirically compare WNSMN on DROP-num with the SoTA NMN and GenBERT that allow learning with partial or no strong supervision. Our results showcase that the proposed WNSMN achieves 32% better accuracy than NMN in absence of at least one or more types of supervision and performs 8% better than GenBERT when the latter is fine-tuned only on DROP in a comparable setup, without additional synthetic data having explicit supervision.

2. MODEL: WEAKLY SUPERVISED NEURO-SYMBOLIC MODULE NETWORK

We now describe our proposed WNSMN that learns to infer the answer based on weak supervision of the QA pair by generating the program form of the query and executing it through explicit reasoning. Parsing Query into Programs To keep the framework generic, we use a simplified representation of the Stanford dependency parse tree (Chen & Manning, 2014) of the query to get a generalized program (Appendix A.5). First, a node is constructed for the subtree rooted at each child of the root by merging its descendants in the original word order. Next an edge is added from the left-most node (which we call the root clause) to every other node. Then by traversing left to right, each node is organized into a step of a program having a linear flow. For example, the program obtained in Figure



Figure 1: Example (passage, query, answer) from DROP and outline of our method: executing noisy program obtained from dependency parsing of query by learning date/number entity specific cross attention, and sampling and execution of discrete operations on entity arguments to reach the answer.

