FOCUS: FAIRNESS VIA AGENT-AWARENESS FOR FEDERATED LEARNING ON HETEROGENEOUS DATA Anonymous authors Paper under double-blind review

Abstract

Federated learning (FL) provides an effective collaborative training paradigm, allowing local agents to train a global model jointly without sharing their local data to protect privacy. However, due to the heterogeneous nature of local data, it is challenging to optimize or even define fairness of the trained global model for the agents. For instance, existing work usually considers accuracy equity as fairness for different agents in FL, which is limited, especially under the heterogeneous setting, since it is intuitively "unfair" to enforce agents with high-quality data (e.g., hospitals with high-resolution data and fine-grained labels) to achieve similar accuracy to those who contribute low-quality data (e.g., hospitals with low-resolution data and noisy labels), which may discourage the agents from participating in FL. In this work, we aim to address such limitations and propose a formal fairness definition in FL, fairness via agent-awareness (FAA), which takes different contributions of heterogeneous agents into account. Under FAA, the performance of agents with high-quality data will not be sacrificed just due to the existence of large amounts of agents with low-quality data. In addition, we propose a fair FL training algorithm based on agent clustering (FOCUS) to achieve fairness in FL measured by FAA. Theoretically, we prove the convergence and optimality of FOCUS under mild conditions for linear and general convex loss functions with bounded smoothness. We also prove that FOCUS always achieves higher fairness in terms of FAA compared with standard FedAvg under both linear and general convex loss functions. Empirically, we evaluate FOCUS on four datasets, including synthetic data, images, and texts under different settings, and we show that FOCUS achieves significantly higher fairness in terms of FAA while maintaining similar or even higher prediction accuracy compared with FedAvg and other existing fair FL algorithms.

1. INTRODUCTION

Federated learning (FL) is emerging as a promising approach to enable scalable intelligence over distributed settings such as mobile networks (Lim et al., 2020; Hard et al., 2018) . Given the wide adoption of FL, including medical analysis (Sheller et al., 2020; Adnan et al., 2022) , recommendation systems (Minto et al., 2021; Anelli1 et al., 2021) , and personal Internet of Things (IoT) devices (Alawadi et al., 2021) , how to ensure the fairness of the trained global model in FL is of great importance before its large-scale deployment, especially when the data quality/contributions of different agents are different in the heterogeneous setting. In general, fairness is defined as the protection of a specific attribute, and fair FL is usually in the form of equity, which means that each individual that joins collaborative learning would not suffer from bad performance due to their identity. Several studies have explored fairness in FL, which mainly focus on the fairness of the final trained model regarding the protected attributes without considering different contributions of agents (Chu et al., 2021; Hu et al., 2022) or the accuracy parity across agents (Li et al., 2020b; Donahue & Kleinberg, 2022a; Mohri et al., 2019) . Some works have considered the properties of local agents, such as the local data properties (Zhang et al., 2020; Kang et al., 2019) and data size (Donahue & Kleinberg, 2022b) . However, the fairness analysis in FL under heterogeneous data distributions is still lacking. Thus, in this paper, we aim to ask: What is the fairness of FL that is able to take different contributions of heterogeneous local agents into account? Can we enhance the fairness of FL by providing advanced training algorithms?

