EFFICIENT NEURAL MACHINE TRANSLATION WITH PRIOR WORD ALIGNMENT

Abstract

Prior word alignment has been shown indeed helpful for better translation, if such prior is good enough and can be acquired in a convenient way at the same time. Traditionally, word alignment can be learned through statistical machine translation (SMT) models. In this paper, we propose a novel method that infuses prior word alignment information into neural machine translation (NMT) to provide hints or guidelines for the target sentence at running time. To this end, previous works of similar approaches should build dictionaries for specific domains, or constraint the decoding process, or both. While being effective to some extent, these methods may greatly affect decoding speed and hurt translation flexibility and efficiency. Instead, this paper introduces an enhancement learning model, which can learn how to directly replace specific source words with their target counterparts according to prior alignment information. The proposed model is then inserted into a neural MT model and augments MT input with the additional target information from the learning model in an effective and more efficient way. Our novel method achieves BLEU improvements (up to 1.1) over a strong baseline model on English-Korean, English-to-German and English-Romanian translation tasks.

1. INTRODUCTION

As neural machine translation (NMT) models have become the dominant approach in the machine translation task, the explicit word alignment model, which is an essential intermediary result from the training of most statistical machine translation (SMT) models (Koehn et al., 2003; Och & Ney, 2004; Ganchev et al., 2008) , seems becoming increasingly obsolete. Prior research suggests that the attention mechanism of NMT systems takes over the word alignment model of SMT systems (Bahdanau et al., 2014) . However, the word alignment information extracted from the attention mechanism is far from gold alignment and even performs much worse than automatic word aligner such as FastAlign or GIZA++. In this study, we focus on the use of prior word alignment in the NMT system to improve translation performance. With the guidance of good enough known word alignment, replacing some words in the source sentence with semantically corresponding words in the target language leads to better translation or user-desired translation, and it is also known as a tip to use the translator well. As in Figure 1 , we can see that an open translation systemfoot_0 generates a better translation closer to the target sentence when some words of the target sentence are provided in the source sentence. In order words, a user can use specific alignment, such as ᄀ ᅩ ᆼᄀ ᅢ ↔ released and ᄉ ᅡᄌ ᅵ ᆫ ↔ picture, to get a desired translation. The case in Figure 1 happens because word alignment between source and target sentences more or less holds no matter how the model acquires such alignment. Besides, not all word alignment may help and only those good enough word alignment can truly enhance the model. When the concerned language pair shares a large vocabulary, such good enough alignment may be easily obtained and then conveniently works in the proposed early substituting way. This work will right explore an effective way to figure out those 'good' enough certain alignment for neural machine translation enhancement. Previous studies of similar approaches can be largely divided into two categories: constraint decoding and augmenting MT input with its corresponding target information. The former is to leverage



https://translate.google.com 1

