COMPOSITIONAL SEMANTIC PARSING WITH LARGE LANGUAGE MODELS

Abstract

Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based syntactic parsing, then uses this decomposition to select appropriate exemplars and to sequentially generate the semantic parse. This method allows us to set a new state of the art for CFQ while requiring only 1% of the training data used by traditional approaches. Due to the general nature of our approach, we expect similar efforts will lead to new results in other tasks and domains, especially for knowledge-intensive applications.

1. INTRODUCTION

Compositionality is a key part of human intelligence as it allows us to understand and produce a potentially infinite number of novel combinations of known components (Chomsky, 1957; Montague, 1970; Lake et al., 2017) . In contrast, standard neural sequence models, transformers and recurrent neural networks, often fail to capture the compositional structure of the problem domain and thus fail to generalize compositionally (Keysers et al., 2020; Furrer et al., 2020) . Prior efforts to improve compositional generalization primarily rely on specialized architectures or training procedures (Lake, 2019; Chen et al., 2020; Nye et al., 2020; Andreas, 2020; Conklin et al., 2021; Akyürek et al., 2021; Liu et al., 2021) . Although effective, these can be task-specific. Even more general purpose methods that rely on data augmentation are limited in the class of data it can support (Shaw et al., 2021; Qiu et al., 2022a) . Prompting on the other hand is sufficiently flexible and, with recent advancement of large-scale pretrained language models (LLMs), has become an effective and generic approach to address a wide range of language understanding problems (Brown et al., 2020) . Prompting now performs on-par or better than model finetuning in many cases (Wei et al., 2022b; Chowdhery et al., 2022; Wei et al., 2022a; Kojima et al., 2022; Ahn et al., 2022) , and might be suitable for improving language model performance on compositional generalization. In particular, recent work (Zhou et al., 2022) found that least-to-most prompting shows a lot of potential for adapting LLMs for compositional generalization, achieving 99.7% accuracy on SCAN, a commonly used compositional generalization benchmark. Least-to-most prompting decomposes each problem into a series of subproblems, then sequentially solves one after another. However, SCAN is an artificial task built upon a synthetic language with a tiny vocabulary and is generated from a small set of grammar rules, and it is unclear whether strong results transfer to more realistic tasks that are based on a larger vocabulary and more complicated grammars (Furrer et al., 2020) . Additional challenges arise when applying least-to-most prompting to more realistic semantic parsing benchmarks. Among others, they may require information beyond what fits in a single prompt. Also, decomposing a problem is more difficult than with SCAN, exacerbated by constituents that cannot be translated independent of their context. We address these challenges with dynamic leastto-most prompting, a generic refinement of least-to-most prompting that involves the following steps: (1) tree-structured decomposition of natural language inputs through LM-predicted syntactic parsing, (2) using the decomposition to dynamically select exemplars, and (3) linearizing the decomposition tree and prompting the model to sequentially generate answers to subproblems.

