TRANSLATION MEMORY GUIDED NEURAL MACHINE TRANSLATION

Abstract

Many studies have proven that Translation Memory (TM) can help improve the translation quality of neural machine translation (NMT). Existing ways either employ extra encoder to encode information from TM or concatenate source sentence and TM sentences as encoder's input. These previous methods don't model the semantic relationship between the source sentence and TM sentences. Meanwhile, the training corpus related to TM is limited, and the sentence level retrieval approach further limits its scale. In this paper, we propose a novel method to combine the strengths of both TM and NMT. We treat the matched sentence pair of TM as the additional signal and apply one encoder enhanced by the pre-trained language model (PLM) to encode the TM information and source sentence together. Additionally, we extend the sentence level retrieval method to the n-gram retrieval method that we don't need to calculate the similarity score. Further, we explore new methods to manipulate the information flow from TM to the NMT decoder. We validate our proposed methods on a mixed test set of multiple domains. Experiment results demonstrate that the proposed methods can significantly improve the translation quality and show strong adaptation for an unknown or new domain.

1. INTRODUCTION

Neural machine translation (NMT), an end-to-end approach, has achieved state-of-the-art translation performance on many language pairs (Vaswani et al., 2017; Wang et al., 2019) . Usually, a trained NMT model translates a new sentence into the target language from scratch. However, human translators can quickly and accurately translate a sentence by reuse existing repetitive translation fragments in the translation memory (TM). Therefore, we naturally think of using TM to improve the translation quality of NMT. Typically, a TM consists of bilingual parallel sentence pairs(TMsource and TM-target) that are similar to the current sentence to be translated (Koehn & Senellart, 2010; Cao & Xiong, 2018) . And from statistical machine translation (SMT) to NMT, a variety of efforts have been made to integrate a TM into machine translation. The process of integrating TM information and NMT mainly includes two steps: TM retrieval and fusion of TM information and NMT network. For the fusion of TM and NMT, such attempts have already been conducted. And a commonly used integration way is to employ the multi-encode structure. Cao & Xiong (2018) propose a simple method that employs a new encoder to encode the TM information to guide the decoding process and Xia et al. ( 2019) also use a graph-based encoder to pack the TM sentences into a graph. Their methods all require additional encoder structure, ignore the TM-source information, and only encode the TM-target information. These will cause a series of problems. On the one hand, it will significantly increase the parameter scale of the network. On the other hand, the encoding process of TM-target information and source sentence are isolated from each other, so the semantic connection between them is lost. About TM retrieval, various metrics can be used to estimate the similarity score of two sentences. We select the sentences with the highest similarity from the TM database for the current source sentence by calculating the similarity score. The existing retrieval approaches used in previous work are usually to calculate the sentence level similarity score, such as Edit-Distance (Gu et al., 2017; Xia et al., 2019) , IDF-based similarity score (Bapna & Firat, 2019) , and cosine similarity (Xu et al., 2020) . TM's current work is experimenting with relatively small data sets that are usually only hundreds of thousands of sentences. One main reason is that we use sentence-level similarity. When we also set a relatively high similarity threshold for a source sentence, we have a high probability that we will not find a similar sentence in the TM database. Although Bapna & Firat (2019) and Xu et al. ( 2020) also use the n-gram method to search, they still need to select the corresponding sentence that maximizes the n-gram similarity with the source sentence. Meanwhile, a small training set will also lead to insufficient training of network parameters. In this paper, to address the problem presented above, we proposed a novel and effective method for the combination of TM and NMT. The proposed approach's key idea is to treat the matched TM-source and TM-target as the additional signal and try to encode them with the source sentence together. Specifically, we first find the matched TM-source and TM-target sentence pairs from our training corpus. To enhance the semantic relationship between source sentence, TM-source, and TM-target, we use a universal encoder to encode the three sentences and obtain context representation information simultaneously. Then we explore and try four methods to incorporate the context information into the decoding network. To further strengthen the ability to copy useful information from TM context information and alleviate the rare-word problem, we integrate pointer network (Gulcehre et al., 2016; Gu et al., 2016; See et al., 2017) into the decoder. To obtain sufficient training corpus and train network parameters more fully and effectively, in this paper, we also modify the retrieval algorithm and use a pre-trained language model (PLM) to initialize the encoder's parameters. Partially inspired by phrase-based SMT, we don't compute the sentence level similarity score between two sentences in our retrieved method. If two sentences have a common n-gram segment, we assume that they are similar, and the sentence pairs of the TM database can provide a useful segment to help improve the translation quality. Currently, many studies have proven that PLM can offer valuable prior knowledge to enhance the translation performance of NMT (Weng et al., 2020; Song et al., 2019) , So we also employ PLM to initialize the parameters of the encoder and give encoder well-trained parameters as a starting point. To validate the proposed approach's effectiveness, we implement our idea on top of the state-ofthe-art model Transformer (Vaswani et al., 2017) . A series of experiments on the English-to-French translation task demonstrate that the proposed method can significantly improve NMT with the TM information. In summary, we make three main contributions: • We employ the n-gram retrieval to find a similar sentence, and this is very simple and fast. It does not need the complicated fuzzy matches algorithm to calculate the similarity between the source sentence and TM-source from TM or training data. • Does not need an extra encoder to encode the retrieved sentence from TM and use only an encoder enhanced by PLM to model the semantic relationship between TM sentences and the source sentence and obtain their context representation information simultaneously. • apply the Copy Mechanism to alleviate the rare-word problem, especially when we do not have sufficient training corpus. 



at the studies that integrate Translation Memory into machine translation. Many methods have been proposed to combine TM and MT. For example,Koehn & Senellart (2010)   applies the matched segments from TM to SMT in decoding. However, the integration of TM and NMT is more complicated, and limited efforts method is explored so far compared with the fusion ofTM and SMT. Cao & Xiong (2018)  identify this as a multi-input problem and use the multi-encode framework to encode the retrieved TM-target sentence and current source sentence. On this basis,Bapna & Firat (2019)  propose a new approach that incorporates information from source sentence and TM-source while encoding the TM-target. Gu et al. (2017) encode all similar sentences from TM into context-vectors and use context-vectors to decode the target word by an additional attention mechanism.Xia et al. (2019)  further extend the method of(Gu et al., 2017)  in which they pack the sequential TM into a graph, leading to a more efficient attention computation. Our proposed method does not require an extra encoder, so no additional encoder parameters are introduced. Additionally, Different from the method proposed by Bapna & Firat (2019), we encode the three sentences (source, TM-source, TM-target) and obtain their context representation information simultaneously.

