FEDERATED REPRESENTATION LEARNING VIA MAXIMAL CODING RATE REDUCTION Anonymous

Abstract

We propose a federated methodology to learn low-dimensional representations from a dataset that is distributed among several clients. In particular, we move away from the commonly-used cross-entropy loss in federated learning, and seek to learn shared low-dimensional representations of the data in a decentralized manner via the principle of maximal coding rate reduction (MCR 2 ). Our proposed method, which we refer to as FLOW, utilizes MCR 2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible. We theoretically show that our distributed algorithm achieves a first-order stationary point. Moreover, we demonstrate, via numerical experiments, the utility of the learned low-dimensional representations.

1. INTRODUCTION

Federated Learning (FL) has become the tool of choice when seeking to learn from distributed data. As opposed to a centralized setting where data are concentrated in a single node, FL allows datasets to be distributed among a set of clients. This subtle difference plays an important role in practice, where data collection has moved to the edge (e.g., cellphones, cameras, sensors, etc.), and centralizing all the available data might not be possible due to privacy constraints and hardware limitations. Moreover, under the FL paradigm, clients are required to train on their local datasets, which unlike the centralized setting, successfully exploits the existence of available computing resources at the edge (i.e., at each client). The key challenges in FL include dealing with (i) data imbalances between clients, (i) unreliable connections between the server and the clients, (iii) a large number of clients participating in the communication, and (iv) objective mismatch between clients. A vast amount of successful work has been done to deal with challenges (i), (ii), and (iii). However, the often-overlooked challenge of objective mismatch plays a fundamental role in any distributed problem. For an client to participate in a collaborative training process (as opposed to training on its own private dataset), there must be a motivation: each client should see itself improved by taking part in the collaboration. Recent work has shown that even in the case of convex losses, FL converges to a stationary point from a mismatched optimization problem. This implies that there are cases where certain clients own the majority of the data (or even of certain classes), and see their individual performance curtailed by the collaborative approach. When optimizing the average of the losses over the clients, the solution to the optimization problem generally differs from the solution of the individual per-client optimization problems. Objective mismatch becomes a particularly difficult problem in FL given the privacy limitations, which prevents the central server from curtailing this undesirable effect. Moreover, given that in standard FL, the central server possesses no data, and that no proxies of data structures should be shared, a centralized solution cannot be implemented. In order to resolve the objective mismatch issue, several approaches have been proposed. However, most such approaches rely on obtaining more trustworthy gradients in the clients, at the expense of either more communications rounds, or more expensive communications. In this work, we propose an alternative representation learning-based approach to resolve objective mismatch, where low-dimensional representations of the data are learned in a distributed manner. We specifically bridge two seemingly disconnected fields, namely federated representation learning and rate distortion theory. We leverage the rate distortion theory to propose a principled way of

