GRAPPA: GRAMMAR-AUGMENTED PRE-TRAINING FOR TABLE SEMANTIC PARSING

Abstract

We present GRAPPA, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG). We pre-train GRAPPA on the synthetic data to inject important structural properties commonly found in table semantic parsing into the pre-training language model. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) on several existing table-and-language datasets to regularize our pre-training process. Our proposed pre-training strategy is much data-efficient. When incorporated with strong base semantic parsers, GRAPPA achieves new state-of-the-art results on four popular fully supervised and weakly supervised table semantic parsing tasks. The pre-trained embeddings can be downloaded at https://huggingface.co/Salesforce/grappa_large_jnt.

1. INTRODUCTION

Tabular data serve as important information source for human decision makers in many domains, such as finance, health care, retail and so on. While tabular data can be efficiently accessed via the structured query language (SQL), a natural language interface allows such data to be more accessible for a wider range of non-technical users. As a result, table semantic parsing that maps natural language queries over tabular data to formal programs has drawn significant attention in recent years. Recent pre-trained language models (LMs) such as BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019) achieve tremendous success on a spectrum of natural language processing tasks, including semantic parsing (Zettlemoyer & Collins, 2005; Zhong et al., 2017; Yu et al., 2018b) . These advances have shifted the focus from building domain-specific semantic parsers (Zettlemoyer & Collins, 2005; Artzi & Zettlemoyer, 2013; Berant & Liang, 2014; Li & Jagadish, 2014) to cross-domain semantic parsing (Zhong et al., 2017; Yu et al., 2018b; Herzig & Berant, 2018; Dong & Lapata, 2018; Wang et al., 2020; Lin et al., 2020) . Despite such significant gains, the overall performance on complex benchmarks such SPIDER (Yu et al., 2018b) and WIKITABLEQUESTIONS benchmarks are still limited, even when integrating representations of current pre-trained language models. As such tasks requires generalization to new databases/tables and more complex programs (e.g., SQL), we hypothesize that current pretrained language models are not sufficient for such tasks. First, language models pre-trained using unstructured text data such as Wikipedia and Book Corpus are exposed to a significant domain shift when directly applied to table semantic parsing, where jointly modeling the relation between utterances and structural tables is crucial. Second, conventional pre-training objectives does not consider the underlying compositionality of data (e.g., questions and SQLs) from table semantic parsing. To close this gap, we seek to learn contextual representations jointly from structured tabular data and unstructured natural language sentences, with objectives oriented towards table semantic parsing. 2014; Wang et al., 2015b; Jia & Liang, 2016; Herzig & Berant, 2018; Andreas, 2020) , we induce a synchronous context-free grammar (SCFG) specific to mapping natural language to SQL queries from existing text-to-SQL datasets, which covers most commonly used question-SQL patterns. As shown in Figure 1 , from a text-to-SQL example we can create a question-SQL template by abstracting over mentions of schema components (tables and fields), values, and SQL operations. By executing this template on randomly selected tables we can create a large number of synthetic question-SQL pairs. We train GRAPPA on these synthetic question-SQL pairs and their corresponding tables using a novel text-schema linking objective that predicts the syntactic role of a table column in the SQL for each pair. This way we encourage the model to identify table schema components that can be grounded to logical form constituents, which is critical for most table semantic parsing tasks. To prevent overfitting to the synthetic data, we include the masked-language modelling (MLM) loss on several large-scale, high-quality table-and-language datasets and carefully balance between preserving the original natural language representations and injecting the compositional inductive bias through our synthetic data. We pre-train GRAPPA using 475k synthetic examples and 391.5k examples from existing table-and-language datasets. Our approach dramatically reduces the training time and GPU cost. We evaluate on four popular semantic parsing benchmarks in both fully supervised and weakly supervised settings. GRAPPA consistently achieves new state-of-the-art results on all of them, significantly outperforming all previously reported results.

2. METHODOLOGY

2.1 MOTIVATION Semantic parsing data is compositional because utterances are usually related to some formal representations such as logic forms and SQL queries. Numerous prior works (Berant & Liang, 2014; Wang et al., 2015a; Jia & Liang, 2016; Iyer et al., 2017; Andreas, 2020) have demonstrated the benefits of augmenting data using context-free grammar. The augmented examples can be used to teach the model to generalize beyond the given training examples. However, data augmentation becomes more complex and less beneficial if we want to apply it to generate data for a random domain. More and more work (Zhang et al., 2019b; Herzig et al., 2020b; Campagna et al., 2020; Zhong et al., 2020) shows utilizing augmented data doesn't always result in a



Figure1: An overview of GRAPPA pre-training approach. We first induce a SCFG given some examples in SPIDER. We then sample from this grammar given a large amount of tables to generate new synthetic examples. Finally, GRAPPA is pre-trained on the synthetic data using SQL semantic loss and a small amount of table related utterances using MLM loss.

