BINDING LANGUAGE MODELS IN SYMBOLIC LANGUAGES

Abstract

Though end-to-end neural approaches have recently been dominating NLP tasks in both performance and ease-of-use, they lack interpretability and robustness. We propose BINDER, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. Specifically, we employ GPT-3 Codex as the LM. In the parsing stage, with only a few incontext exemplars, Codex is able to identify the part of the task input that cannot be answerable by the original programming language, correctly generate API calls to prompt Codex to solve the unanswerable part, and identify where to place the API calls while being compatible with the original grammar. In the execution stage, Codex can perform versatile functionalities (e.g., commonsense QA, information extraction) given proper prompts in the API calls. BINDER achieves state-of-the-art results on WIKITABLEQUESTIONS and TABFACT datasets, with explicit output programs that benefit human debugging. Note that previous best systems are all finetuned on tens of thousands of task-specific samples, while BINDER only uses dozens of annotations as in-context exemplars without any training.

1. INTRODUCTION

Performance on natural language processing tasks is dominated by neural end-to-end systems that directly map inputs to outputs (Devlin et al., 2019; Liu et al., 2019; Lewis et al., 2020; Raffel et al., 2020, i.a.) . These end-to-end approaches are flexible and easy-to-use while lacking interpretability and robustness. This stands in contrast to symbolic approaches that produce explicit intermediate representations such as logical forms, reasoning paths, or program code, which might then be executed to derive a final output (Zettlemoyer & Collins, 2005; Gulwani et al., 2017; Chen et al., 2019b, i.a.) . The intermediate form produced by these the resulting execution makes them more robust to input changes. However, their semantic coverage is limited by the affordances of the grammar of the selected symbolic language (e.g., not being able to handle "North America?" in Fig. 1 ), leading to failures on real-world diverse questions, and the intermediate form annotations require expert knowledge and researcher labour. A few works (Andreas et al., 2016; Gupta et al., 2019; Khot et al., 2021; Zhu et al., 2022, i.a.) have been proposed to combine neural modules and symbolic languages (neural-symbolic) to leverage advantages of both approaches. However, they require the elaborate human design of the symbolic language and the calibration of corresponding neural modules to tackle problems in a specific domain with large training data. More specifically, most of these works propose a task-specific symbolic language and corresponding modules that cover only limited semantic phenomena in a specific task and domain. Therefore, new languages and neural modules have to be introduced when adapting them

