FEDERATED NEAREST NEIGHBOR MACHINE TRANSLATION

Abstract

To protect user privacy and meet legal regulations, federated learning (FL) is attracting significant attention. Training neural machine translation (NMT) models with traditional FL algorithms (e.g., FedAvg) typically relies on multi-round model-based interactions. However, it is impractical and inefficient for translation tasks due to the vast communication overheads and heavy synchronization. In this paper, we propose a novel Federated Nearest Neighbor (FedNN) machine translation framework that, instead of multi-round model-based interactions, leverages one-round memorization-based interaction to share knowledge across different clients and build low-overhead privacy-preserving systems. The whole approach equips the public NMT model trained on large-scale accessible data with a k-nearestneighbor (kNN) classifier and integrates the external datastore constructed by private text data from all clients to form the final FL model. A two-phase datastore encryption strategy is introduced to achieve privacy-preserving during this process. Extensive experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg, while maintaining promising translation performance in different FL settings.

1. INTRODUCTION

In recent years, neural machine translation (NMT) has significantly improved translation quality (Bahdanau et al., 2015; Vaswani et al., 2017; Hassan et al., 2018) and has been widely adopted in many commercial systems. The current mainstream system is first built on a large-scale corpus collected by the service provider and then directly applied to translation tasks for different users and enterprises. However, this application paradigm faces two critical challenges in practice. On the one hand, previous works have shown that NMT models perform poorly in specific scenarios, especially when they are trained on the corpora from very distinct domains (Koehn & Knowles, 2017; Chu & Wang, 2018) . The fine-tuning method is a popular way to mitigate the effect of domain drift, but it brings additional model deployment overhead and particularly requires high-quality in-domain data provided by users or enterprises. On the other hand, some users and enterprises pose high data security requirements due to business concerns or regulations from the government (e.g., GDPR and CCPA), meaning that we cannot directly access private data from users for model training. Thus, a conventional centralized-training manner is infeasible in these scenarios. In response to this dilemma, a natural way is to leverage federated learning (FL) (Li et al., 2019) that enables different data owners to train a global model in a distributed manner while leaving raw private data isolated to preserve data privacy. Generally, a standard FL workflow, such as FedAvg (McMahan et al., 2017) , contains multi-round model-based interactions between server and clients. At each round, the client first performs training on the local sensitive data and sends the model update to the server. The server aggregates these local updates to build an improved global model. This straightforward idea has been implemented by prior works (Roosta et al., 2021; Passban et al., 2022) that directly apply FedAvg for machine translation tasks

