SCORE: PRE-TRAINING FOR CONTEXT REPRESENTA-TION IN CONVERSATIONAL SEMANTIC PARSING

Abstract

Conversational Semantic Parsing (CSP) is the task of converting a sequence of natural language queries to formal language (e.g., SQL, SPARQL) that can be executed against a structured ontology (e.g. databases, knowledge bases). To accomplish this task, a CSP system needs to model the relation between the unstructured language utterance and the structured ontology while representing the multi-turn dynamics of the dialog. Pre-trained language models (LMs) are the state-of-the-art for various natural language processing tasks. However, existing pre-trained LMs that use language modeling training objectives over free-form text have limited ability to represent natural language references to contextual structural data. In this work, we present SCORE, a new pre-training approach for CSP tasks designed to induce representations that capture the alignment between the dialogue flow and the structural context. We demonstrate the broad applicability of SCORE to CSP tasks by combining SCORE with strong base systems on four different tasks (SPARC, COSQL, MWOZ, and SQA). We show that SCORE can improve the performance over all these base systems by a significant margin and achieves state-of-the-art results on three of them.

1. INTRODUCTION

The goal of task-oriented dialog systems is to assist the user in completing a certain task by performing an action or retrieving relevant information (Tur & Mori, 2011) . They are often built on top of a structured ontology grounded in a knowledge base, a database, or a set of API calls. This in contrast to open-domain dialog systems (also referred to as chit-chat systems) where the goal is to maximize engagement with users in open-ended conversations (Jafarpour et al., 2010; Ritter et al., 2011) . A key component of task-oriented conversational systems is Conversational Semantic Parsing (CSP), which converts each utterance in the dialog into a formal language query (e.g., SQL, SPARQL) that can be executed against the structured ontology. CSP has been extensively studied in several academic and industrial research settings such as dialog systems (e.g., dialog state tracking in MWOZ (Budzianowski et al., 2018) ), interacting with physical agents (e.g., (Chai et al., 2018) ), context-dependent semantic parsing (e.g., SPARC (Yu et al., 2019b) ), SQL-grounded state tracking (e.g., COSQL (Yu et al., 2019a) ), and sequential question answering (e.g., SQA (Iyyer et al., 2017) ). These settings differ in some respect, but they share the same overall objective and key challenge: how to jointly represent the natural language utterances and underlying structured ontology while taking into consideration the multi-turn dynamics of the dialog. Similar to many other natural language tasks, recent work in CSP has significantly benefited from advances in language model pre-training. However, existing general-purpose pre-trained language models, e.g. BERT (Devlin et al., 2019) , are pre-trained on free-form text data using language model objectives. This limits their ability in modeling the structural context or the multi-turn dynamics of the dialogs. This presents an opportunity to improve pre-trained LMs to specifically address these limitations for CSP tasks. Recent work has demonstrated the benefits of adapting pre-trained LMs Usr: Find the names of the top 3 highest sales books. Usr: Who are their authors? Usr: Also show the names of their publishers. … ... In contrast to open-domain dialogs, CSP datasets are usually much smaller due to the difficulty and expense of obtaining and labeling data (mapping natural language utterances to formal language). Unlike most prior work on contextualized LMs which are pre-trained on free text, according to the finding where questions in CSP tasks are more compositional than other free-text since they can be mapped into formal representations, we propose to train SCORE on synthesized conversational semantic parsing data with multiple training objectives that aim to ground utterances into the schema of the underlying ontology and to model the relationship between different utterances in the multi-turn conversation. In this way, SCORE can effectively inject structural and conversational inductive biases in LMs that can translate to many CSP tasks. SCORE uses an order of magnitude smaller dataset for the second stage of pre-training, does not require changes to the pre-trained model architecture, can be used as a drop-in replacement of general pre-trained LMs with any semantic parsing model, and can be used out-of-the-box in many CSP tasks. Usr We apply SCORE to four different CSP tasks: (1) sequential text-to-SQL (SPARC), ( 2) conversational text-to-SQL (COSQL), (3) dialog state tracking (MWOZ), and (4) weakly-supervised sequential question answering (SQA). The fours tasks represent different scenarios, types of ontologies, supervision signals, system responses, and domains (see Table 1 for a detailed comparison and Figure 1 for examples). We demonstrate that: (1) SCORE training objectives can effectively incorporate synthesized data, (2) a single pre-trained SCORE model can be used for several CSP tasks and can be combined with many baseline systems with different model architectures and (3) SCORE significantly improve all baseline systems and achieves new state-of-the-art results on three benchmarks (SPARC, SPARC, and MWOZ) and comparable performance to state-of-the-art results on the fourth (SQA).

2. APPROACH

The key challenge of CSP is to capture the relationship between the natural language utterance and the structured ontology in the multi-turn dialog dynamics. To this end, we inject structural and conversational inductive biases in SCORE by introducing two objective functions: Column Contextual Semantics (CCS) and the Turn Contextual Switch (TCS). Because the size of existing semantic parsing datasets is limited, we produce synthesized data for pretraining SCORE by sampling from the context-free grammar induced from complex text-to-SQL examples in different domains. Moreover, to prevent SCORE from overfitting to the linguistic pattern of our synthesized data, we use the Masked Language Modeling (MLM) objective on human-generated utterances as regularization.

Task Definition

In CSP, at each turn t, we aim to produce a formal representation q t given the current utterance u t , the interaction history h t = [u 1 , u 2 , . . . , u t-1 ], and the schema c (table and column names, slots, etc.) of the target database (ontology) d. To cover different problem variants, we



: I am looking for a cheap restaurant in the centre of the city Sys: There is a cheap chinese restaurant called Dojo Noodle Bar. Usr: Yes please , for 8 people at 18:30 on Thursday … ...Multi-turn text-to-SQL<s> also show the names of their publishers <s> who are … authors <s> find <mask> … books </s> author id </s> author name </s> … ...</s> sale amount

