TRANSLATION MEMORY GUIDED NEURAL MACHINE TRANSLATION

Abstract

Many studies have proven that Translation Memory (TM) can help improve the translation quality of neural machine translation (NMT). Existing ways either employ extra encoder to encode information from TM or concatenate source sentence and TM sentences as encoder's input. These previous methods don't model the semantic relationship between the source sentence and TM sentences. Meanwhile, the training corpus related to TM is limited, and the sentence level retrieval approach further limits its scale. In this paper, we propose a novel method to combine the strengths of both TM and NMT. We treat the matched sentence pair of TM as the additional signal and apply one encoder enhanced by the pre-trained language model (PLM) to encode the TM information and source sentence together. Additionally, we extend the sentence level retrieval method to the n-gram retrieval method that we don't need to calculate the similarity score. Further, we explore new methods to manipulate the information flow from TM to the NMT decoder. We validate our proposed methods on a mixed test set of multiple domains. Experiment results demonstrate that the proposed methods can significantly improve the translation quality and show strong adaptation for an unknown or new domain.

1. INTRODUCTION

Neural machine translation (NMT), an end-to-end approach, has achieved state-of-the-art translation performance on many language pairs (Vaswani et al., 2017; Wang et al., 2019) . Usually, a trained NMT model translates a new sentence into the target language from scratch. However, human translators can quickly and accurately translate a sentence by reuse existing repetitive translation fragments in the translation memory (TM). Therefore, we naturally think of using TM to improve the translation quality of NMT. Typically, a TM consists of bilingual parallel sentence pairs(TMsource and TM-target) that are similar to the current sentence to be translated (Koehn & Senellart, 2010; Cao & Xiong, 2018) . And from statistical machine translation (SMT) to NMT, a variety of efforts have been made to integrate a TM into machine translation. The process of integrating TM information and NMT mainly includes two steps: TM retrieval and fusion of TM information and NMT network. For the fusion of TM and NMT, such attempts have already been conducted. And a commonly used integration way is to employ the multi-encode structure. Cao & Xiong (2018) propose a simple method that employs a new encoder to encode the TM information to guide the decoding process and Xia et al. ( 2019) also use a graph-based encoder to pack the TM sentences into a graph. Their methods all require additional encoder structure, ignore the TM-source information, and only encode the TM-target information. These will cause a series of problems. On the one hand, it will significantly increase the parameter scale of the network. On the other hand, the encoding process of TM-target information and source sentence are isolated from each other, so the semantic connection between them is lost. About TM retrieval, various metrics can be used to estimate the similarity score of two sentences. We select the sentences with the highest similarity from the TM database for the current source sentence by calculating the similarity score. The existing retrieval approaches used in previous work are usually to calculate the sentence level similarity score, such as Edit-Distance (Gu et al., 2017; Xia et al., 2019) , IDF-based similarity score (Bapna & Firat, 2019) , and cosine similarity (Xu et al., 2020) . TM's current work is experimenting with relatively small data sets that are usually only hundreds of thousands of sentences. One main reason is that we use sentence-level similarity. When

