TOWARDS UNDERSTANDING AND MITIGATING DIMENSIONAL COLLAPSE IN HETEROGENEOUS FEDERATED LEARNING

Abstract

Federated learning aims to train models collaboratively across different clients without sharing data for privacy considerations. However, one major challenge for this learning paradigm is the data heterogeneity problem, which refers to the discrepancies between the local data distributions among various clients. To tackle this problem, we first study how data heterogeneity affects the representations of the globally aggregated models. Interestingly, we find that heterogeneous data results in the global model suffering from severe dimensional collapse, in which representations tend to reside in a lower-dimensional space instead of the ambient space. Moreover, we observe a similar phenomenon on models locally trained on each client and deduce that the dimensional collapse on the global model is inherited from local models. In addition, we theoretically analyze the gradient flow dynamics to shed light on how data heterogeneity result in dimensional collapse for local models. To remedy this problem caused by the data heterogeneity, we propose FEDDECORR, a novel method that can effectively mitigate dimensional collapse in federated learning. Specifically, FEDDECORR applies a regularization term during local training that encourages different dimensions of representations to be uncorrelated. FEDDECORR, which is implementation-friendly and computationally-efficient, yields consistent improvements over baselines on standard benchmark datasets. Code:

1. INTRODUCTION

With the rapid development deep learning and the availability of large amounts of data, concerns regarding data privacy have been attracting increasingly more attention from industry and academia. To address this concern, McMahan et al. (2017) propose Federated Learning-a decentralized training paradigm enabling collaborative training across different clients without sharing data. One major challenge in federated learning is the potential discrepancies in the distributions of local training data among clients, which is known as the data heterogeneity problem. In particular, this paper focuses on the heterogeneity of label distributions (see Fig. 1 (a) for an example). Such discrepancies can result in drastic disagreements between the local optima of the clients and the desired global optimum, which may lead to severe performance degradation of the global model. Previous works attempting to tackle this challenge mainly focus on the model parameters, either during local training (Li et al., 2020; Karimireddy et al., 2020) or global aggregation (Wang et al., 2020b) . However, these methods usually result in an excessive computation burden or high communication costs (Li et al., 2021a) because deep neural networks are typically heavily over-parameterized. In contrast, in this work, we focus on the representation space of the model and study the impact of data heterogeneity. To commence, we study how heterogeneous data affects the global model in federated learning in Sec. 3.1. Specifically, we compare representations produced by global models trained under different degrees of data heterogeneity. Since the singular values of the covariance matrix provide a comprehensive characterization of the distribution of high-dimensional embeddings, we use it to

availability

https://github.com/bytedance/FedDecorr.

