SEMI-VARIANCE REDUCTION FOR FAIR FEDERATED LEARNING Anonymous authors Paper under double-blind review

Abstract

Ensuring fairness in Federated Learning (FL) systems, i.e. a satisfactory performance for all of the diverse clients in the systems, is an important and challenging problem. There are multiple fair FL algorithms in the literature, which have been relatively successful in providing fairness. However, these algorithms mostly emphasize on the loss functions of worst-off clients to improve their performance, which often results in the suppression of well-performing ones. As a consequence, they usually sacrifice the system overall average performance for achieving fairness. Motivated by this and inspired by two well-known risk modeling methods in Finance, Mean-Variance and Mean-Semi-Variance, we propose and study two new fair FL algorithms, Variance Reduction (VRed) and Semi-Variance Reduction (Semi-VRed). VRed encourages equality between clients loss functions by penalizing their variance. In contrast, Semi-VRed penalizes the discrepancy of only the worst-off clients loss functions from the average loss. Through extensive experiments on multiple vision and language datasets, we show that, Semi-VRed achieves SoTA performance in scenarios with highly heterogeneous data distributions and improves both fairness and system overall average performance.

1. INTRODUCTION

Federated Learning McMahan et al. (2017) is a framework consisting of some clients and the private data that is distributed among them, and it allows training of a shared or personalized model based on the clients data. Since the invention of FL by proposing the well-known FedAvg algorithm (McMahan et al., 2017) , it has attracted an intensive amount of attention and much progress has been made in its different aspects, including algorithmic innovations (Li et al., 2020b; Reddi et al., 2020a; Pathak & Wainwright, 2020; Huo et al., 2020; Wang et al., 2020; Reddi et al., 2020b; Qu et al., 2022) , fairness (McMahan et al., 2017; Li et al., 2020c; Mohri et al., 2019; Li et al., 2020a; Yue et al., 2021; Zhang et al., 2022a) , convergence analysis (Khaled et al., 2020; Li et al., 2020; Gorbunov et al., 2021 ), personalization (Zhang et al., 2021; Chen & Chao, 2022; Oh et al., 2022; Zhang et al., 2022b; Bietti et al., 2022) . Due to heterogeneity in clients data and their resources, performance fairness is an important challenge in FL systems. There have been some previous works addressing this problem. For instance, Mohri et al. (2019) proposed Agnostic Federated Learning (AFL), which aims at minimizing the largest loss function among clients through a minimax optimization framework. Similarly, Li et al. (2020a) proposed an algorithm called TERM using tilted losses. Ditto (Li et al., 2021) is another existing algorithm based on model personalization for clientsfoot_0 . Also, q-Fair Federated Learning (q-FFL) (Li et al., 2020c ) is an algorithm inspired by α-fairness in wireless networks (Lan et al., 2010) . Recently, Zhang et al. (2022a) proposed PropFair based on the concept of Proportional Fairness (PF). Interestingly, they also showed that all the aforementioned fair FL algorithms can be unified into a generalized mean framework. GiFair (Yue et al., 2021) is another recent algorithm which achieves fairness using a different mechanism than the previously mentioned algorithms: by penalizing the discrepancy between clients loss functions, i.e. encouraging equality. FCFL (Cui et al., 2021) uses a constrained version of AFL for achieving both algorithmic parity and performance consistency in FL settings. Being designed for fair FL, the aforementioned algorithms usually result in the suppression of wellperforming clients, due to the lower weights that the algorithms place on them or due to the equality that is encouraged between clients losses (GiFair). As a consequence, they achieve an overall average performance which is either smaller than or close to that of vanilla FedAvg. This is our motivation for proposing our two new algorithms. Our inspiration in this paper is a concept in Finance called risk modeling used for portfolio selection. There are two vastly used methodologies for risk modeling: Mean-Variance (MV) (Zhang et al., 2018; Soleimani et al., 2009; Markowitz, 1952) and its expansion: Mean-Semi-Variance (MSV) (Boasson et al., 2017; Plà-Santamaria & Bravo, 2013; Ballestero, 2005; Stuart & Markowitz, 1959) , which are used for quantifying investment return and investment risk. Motivated by the vast usage of these methodologies and their great success in financial planning, we bring the MV and MSV methods to FL by proposing our Variance Reduction (VRed) and Semi-Variance Reduction (Semi-VRed) algorithms, respectively. By conducting extensive experiments on popular vision and language datasets, we show that our VRed algorithm achieves a performance competitive to existing baseline fair FL algorithms. More importantly, Semi-VRed achieves state-of-the-art performance in terms of both fairness and system overall average performance.

2. BACKGROUND

With formal notations, we consider an FL setting with n clients for the task of multi-class classification. Let x ∈ X ⊆ R p and y ∈ Y = {1, . . . , C} denote the input data point and its target label, respectively. Each client i has its own private data with data distribution P i (x, y). Let h : X ×Θ → R C be the used predictor function, which is parameterized by θ ∈ Θ ⊆ R d shared among all clients. Also, let ℓ : R C × Y → R + be the loss function, which we choose to be the cross entropy loss. Client i minimizes loss function f i (θ) = E (x,y)∼Pi(x,y) [ℓ(h(x, θ), y)] with minimum value of f * i . There are various fair FL algorithms in the literature. In table 3 in the appendix, we have provided details of the most recent algorithms with their formulations. The existing fair FL algorithms can be grouped into two main categories: 2022a) that this set of existing fair FL algorithms can be unified into a generalized mean framework (Kolmogorov, 1930) , where more attention is paid to the clients with larger losses.

2.2. ALGORITHMS BASED ON ENCOURAGING EQUALITY

The second category of fair FL algorithms, which includes GiFair, is based on encouraging equality between clients loss functions. GiFair adds a regularization term to the objective of FedAvg to penalize the discrepancy between clients loss functions (see table 3 in the appendix). In this way, it encourages equality between clients loss functions. A common feature of all the aforementioned algorithms is their emphasis on the clients with relatively larger losses, which usually results in the suppression of the well-performing clients. This might result in the degradation of the overall average performance (measured by the mean test accuracy across clients). In the next sections, we will see that Semi-VRed can achieve fairness by regularizing the semi-variance of clients loss functions and improves both fairness and the system overall performance simultaneously. In the context of variance regularization, there have been some works in the literature: Maurer & Pontil (2009); Namkoong & Duchi (2017) propose regularizing the empirical risk minimization (ERM) by the empirical variance of losses across training samples to balance bias and variance and improve out-of-sample (test) performance and convergence rate. Similarly, Shivaswamy & Jebara (2010) propose boosting binary classifiers based on a variance penalty applied to exponential loss. Variance regularization has also been used for out-of-distribution (domain) generalization: assuming having access to data from multiple training domains, Krueger et al. (2021) propose penalizing variance of training risks across the domains, as a method of distributionally robust optimization, to provide domain generalization.



In order to have fair comparison with our baseline algorithms, we do not use model personalization in this work.



ALGORITHMS BASED ON THE GENERALIZED MEAN This category of algorithms includes FedAvg (McMahan et al., 2017), q-FFL (Li et al., 2020c), AFL (Mohri et al., 2019), TERM (Li et al., 2020a), PropFair (Zhang et al., 2022a). It was shown by Zhang et al. (

