SECURE FEDERATED LEARNING OF USER VERIFICATION MODELS

Abstract

We consider the problem of training User Verification (UV) models in federated setup, where the conventional loss functions are not applicable due to the constraints that each user has access to the data of only one class and user embeddings cannot be shared with the server or other users. To address this problem, we propose Federated User Verification (FedUV), a framework for private and secure training of UV models. In FedUV, users jointly learn a set of vectors and maximize the correlation of their instance embeddings with a secret user-defined linear combination of those vectors. We show that choosing the linear combinations from the codewords of an error-correcting code allows users to collaboratively train the model without revealing their embedding vectors. We present the experimental results for user verification with voice, face, and handwriting data and show that FedUV is on par with existing approaches, while not sharing the embeddings with other users or the server.

1. INTRODUCTION

There has been a recent increase in the research and development of User Verification (UV) models with various modalities such as voice (Snyder et al., 2017; Yun et al., 2019) , face (Wang et al., 2018) , fingerprint (Cao & Jain, 2018) , or iris (Nguyen et al., 2017) . Machine learning-based UV features have been adopted by commercial smart devices such as mobile phones, AI speakers and automotive infotainment systems for a variety of applications such as unlocking the system or providing user-specific services, e.g., music recommendation, schedule notification, or other configuration adjustments (Matei, 2017; Barclays, 2013; Mercedes, 2020) . User verification is a binary decision problem of accepting or rejecting a test example based on its similarity to the user's training examples. We consider embedding-based classifiers, in which a test example is accepted if its embedding is close enough to a reference embedding, and otherwise rejected. Such classifiers are usually trained with a loss function that is composed of two terms, 1) a positive loss that minimizes the distance of the instance embedding to the positive class embedding, and 2) a negative loss that maximizes the distance to the negative class embeddings. The negative loss term is needed to prevent the class embeddings from collapsing into a single point (Bojanowski & Joulin, 2017) . Verification models need to be trained with a large variety of users' data so that the model learns different data characteristics and can reliably reject imposters. However, due to the privacy-sensitive nature of the biometric data used for verification, it is not possible to centrally collect large training datasets. One approach to address the data collection problem is to train the model in the federated setup, which is a framework for training models by repeatedly communicating the model weights and gradients between a central server and a group of users (McMahan et al., 2017a) . Federated learning (FL) allows training models without users having to share their data with the server or other users and, hence, helps enable private training of verification models. Training UV models in federated setup, however, imposes additional constraints that each user has access to the data of only one class, and cannot share their embedding vector with the server or other users due to security reasons. Specifically, sharing embeddings with others can lead to both trainingand test-time attacks. For example, the server or other users can run a poisoning attack (Biggio et al., 2012) and train the model so that it verifies fake examples as the examples generated by a particular 1

