PHRASETRANSFORMER: SELF-ATTENTION USING LO-CAL CONTEXT FOR SEMANTIC PARSING

Abstract

Semantic parsing is a challenging task whose purpose is to convert a natural language utterance to machine-understandable information representation. Recently, solutions using Neural Machine Translation have achieved many promising results, especially Transformer because of the ability to learn long-range word dependencies. However, the one drawback of adapting the original Transformer to the semantic parsing is the lack of detail in expressing the information of sentences. Therefore, this work proposes a PhraseTransformer architecture that is capable of a more detailed meaning representation by learning the phrase dependencies in the sentence. The main idea is to incorporate Long Short-Term Memory (LSTM) into the Self-Attention mechanism of the original Transformer to capture more local context of phrases. Experimental results show that the proposed model captures the detailed meaning better than Transformer, raises local context awareness and achieves strong competitive performance on Geo, MSParS datasets, and leads to SOTA performance on Atis dataset in methods using Neural Network.

1. INTRODUCTION

Semantic parsing is an important task which can be applied for many applications such as Question and Answering systems or searching systems using natural language (Woods, 1973; Waltz & Goodman, 1977) . For example, the sentence "which state borders hawaii" can be represented as logical form (LF) using λ-calculus syntax "(lambda $0 e (and (state:t $0) (next to:t $0 hawaii)))". There are various strategies to address the semantic parsing task such as constructing handcraftrules (Woods, 1973; Waltz & Goodman, 1977; Hendrix et al., 1978) , using Combinatory Categorial Grammar (CCG) (Zettlemoyer & Collins, 2005; 2007; Kwiatkowski et al., 2011) , adapting statistical machine translation method (Wong & Mooney, 2006; 2007) or Neural Machine Translation (Dong & Lapata, 2016; Jia & Liang, 2016; Dong & Lapata, 2018; Cao et al., 2019) . The major factor of the CCG method is based on the alignments of sub-parts (lexicons or phrases) between a natural sentence and corresponding logical form and to learn how best to combine these subparts. In more detail, the phrase "borders hawaii" is aligned to "(next to:t $0 hawaiiz)" in LF. Conversely, the methods using Neural Machine Translation learn the encoder representing a sentence into a vector and decode that vector into LF. The current SOTA models are Sequence-to-Sequence using LSTM (Seq2seq) (Dong & Lapata, 2018; Cao et al., 2019) on Geo, Atis and Transformer (Ge et al., 2019) on MSParS. The methods using Neural Network almost work effectively without any handcrafted features. However, there is still room to improve the performance based on the meaning of local context in phrases. According to CCG methods, the semantic representation of a sentence is the combination of submeaning representation generated by phrases in a sentence. However, Transformer architecture 



Figure 1: Pharse alignments in PhraseTransformer.

