SIMPLE AND SCALABLE NEAREST NEIGHBOR MA-CHINE TRANSLATION

Abstract

kNN- MT (Khandelwal et al., 2021) is a straightforward yet powerful approach for fast domain adaptation, which directly plugs pre-trained neural machine translation (NMT) models with domain-specific token-level k-nearest-neighbor (kNN) retrieval to achieve domain adaptation without retraining. Despite being conceptually attractive, kNN-MT is burdened with massive storage requirements and high computational complexity since it conducts nearest neighbor searches over the entire reference corpus. In this paper, we propose a simple and scalable nearest neighbor machine translation framework to drastically promote the decoding and storage efficiency of kNN-based models while maintaining the translation performance. To this end, we dynamically construct an extremely small datastore for each input via sentence-level retrieval to avoid searching the entire datastore in vanilla kNN-MT, based on which we further introduce a distance-aware adapter to adaptively incorporate the kNN retrieval results into the pre-trained NMT models. Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of kNN-MT.

1. INTRODUCTION

Domain adaptation is one of the fundamental challenges in machine learning which aspires to cope with the discrepancy across domain distributions and improve the generality of the trained models. It has attracted wide attention in the neural machine translation (NMT) area (Britz et al., 2017; Chen et al., 2017; Chu & Wang, 2018; Bapna & Firat, 2019; Bapna et al., 2019; Wei et al., 2020) . Recently, kNN-MT and its variants (Khandelwal et al., 2021; Zheng et al., 2021a; b; Wang et al., 2022a) provide a new paradigm and have achieved remarkable performance for fast domain adaptation by retrieval pipelines. These approaches combine traditional NMT models (Bahdanau et al., 2015; Vaswani et al., 2017) with a token-level k-nearest-neighbour (kNN) retrieval mechanism, allowing it to directly access the domain-specific datastore to improve translation accuracy without fine-tuning the entire model. By virtue of this promising ability, a single kNN-MT can be seamlessly generalized to other domains by simply altering the external knowledge it attends to. In spite of significant achievements and potential benefits, the critical bottleneck of kNN-MT is its large token-level external knowledge (also called a datastore), which brings massive storage overhead and high latency during inference. For instance, Khandelwal et al. (2021) found that kNN-MT is two orders of magnitude slower than the base NMT system in a generation speed when retrieving 64 keys from a datastore containing billions of records. To ease this drawback and make kNN search more efficient, a line of works (Martins et al., 2022b; Wang et al., 2022a) proposed methods to reduce the volume of datastore, such as pruning the redundant records and reducing the dimension of keys. On another line, Meng et al. (2022) designed Fast kNN-MT to construct a smaller datastore for each source sentence instead of consulting the entire datastore. Typically, the small datastore is constructed by searching for the nearest token-level neighbors of the source tokens and mapping them to the corresponding target tokens. However, in essence, Fast kNN-MT

