RS-FAIRFRS: COMMUNICATION EFFICIENT FAIR FEDERATED RECOMMENDER SYSTEM

Abstract

Federated Recommender Systems (FRSs) aim to provide recommendations to clients in a distributed manner with privacy preservation. FRSs suffer from high communication costs due to the communication between the server and many clients. Some past literature on federated supervised learning shows that sampling clients randomly improve communication efficiency without jeopardizing accuracy. However, each user is considered a separate client in FRS and clients communicate only item gradients. Thus, incorporating random sampling and determining the number of clients to be sampled in each communication round to retain the model's accuracy in FRS becomes challenging. This paper provides sample complexity bounds on the number of clients that must be sampled in an FRS to preserve accuracy. Next, we consider the issue of demographic bias in FRS, quantified as the difference in the average error rates across different groups. Supervised learning algorithms mitigate the group bias by adding the fairness constraint in the training loss, which requires sharing protected attributes with the server. This is prohibited in a federated setting to ensure clients' privacy. We design RS-FAIRFRS, a Random Sampling based Fair Federated Recommender System, which trains to achieve a fair global model. In addition, it also trains local clients towards a fair global model to reduce demographic bias at the client level without the need to share their protected attributes. We empirically demonstrate across the two most popular real-world datasets (ML1M, ML100k) and different sensitive features (age and gender) that RS-FAIRFRS helps reduce communication cost and demographic bias with improved model accuracy.

1. INTRODUCTION

Recommender systems (RSs) have a wide variety of applications in online platforms like e-business, e-commerce, e-learning, e-tourism, music and movie recommendation engines (Lu et al. (2015) ). Traditional RSs require gathering clients' private information at the central server, leading to serious privacy and security risks. ML models can train locally due to edge devices' increased storage and processing power. This has led to Federated learning (FL) (McMahan et al. (2017) ), which allows clients to share their updates with the server without any data transfer. The server proposes a common model which is communicated with all clients. Using their data and the global model, clients train locally and communicate the updated model to the server. FL has found many applications in the past few years, e.g., Google keyboard query suggestion (Yang et al. ( 2018 2021)), and many more. This paper focuses on two primary issues: communication efficiency and demographic bias in FRSs. Unlike other applications of FL, where one client has data of many users, in FRS, each user acts as one client constituting a user's profile. FedRec (Lin et al. (2021) ), an FRS, expects all the clients to train parallelly using matrix factorization (MF). In each communication round, the server aggregates the model updates from a huge number of local clients to obtain a global model, and this global model is then sent back to all the clients. This whole procedure increases the communication cost. We show that random sampling of clients in each communication round reduces the communication cost even when only item gradients of sampled users are communicated. Theoretically, we provide bounds on an ideal fraction of clients to be sampled to maintain the model's accuracy. Proving sample complexity bounds is non-trivial as the clients may possess non-IID data. To circumvent this issue, we assume an underlying clustering structure on the clients such that clients within a cluster share similar item vectors. The main novelty lies in proving that the random sampling will fetch enough representation from each cluster and the predicted ratings obtained after sampling small number of clients will not be far (with high probability) from that of predicted ratings obtained after communicating with all clients in all the rounds. Fairness in FRSs is a critical yet under-investigated area. Empirically, we prove that FedRec offers better recommendations to a particular group of clients. This unfair treatment can fortify the social stereotypes based on gender, race, or age, resulting in significant repercussions. So far, researchers have studied fairness in the domain of centralized RSs (Li et al. ( 2022 2020) ameliorate bias in classification setting where each client possesses data of many users and thus can train for fairness in each communication round. As opposed to this, in FRS, each user acts as one client that sends its item gradients to the server after updating the user vectors and item gradients locally. This makes it extremely difficult to train locally towards fairness. We propose a dual-fair vector update technique with two phases. In Phase 1, the server aggregates the received item vectors and trains them towards fairness on a small fraction of data. Even if the global model is fair, local client updates may result in a heavily biased model. Thus in Phase 2, the clients minimize local error and learn item vectors closer to the globally fair vectors. In summary, our work aims at mitigating the issues of reducing the communication bottleneck and group bias in Federated Recommendation system (FedRec) (Yang et al. (2020b) ) for the first time. We list down our main contributions below: 1. We provide sample complexity bounds on the fraction of clients required for maintaining accuracy within the desirable limit in Theorem4.1. Our experiments prove that sampling these many clients improve communication costs in FRS without affecting accuracy even when the clients do not disclose their user vectors and share only updated item gradients. 2. We show the existence of group bias in FRS quantified by evaluating discrepancies in the average group losses for each sensitive attribute. To mitigate this issue, we propose a novel dual-fairness vector update technique that tackles the issue of group fairness at local as well as global level. 3. Combining the ideas of random-sampling and dual-fairness vector update, we propose RS-FAIRFRS, a novel FRS model which provides communication efficiency and improved fairness as well as accuracy . 4. We show that RS-FAIRFRS mitigates demographic bias and improves accuracy via extensive experimentation on the two most popular datasets of ML1M and ML100K, with different demographics (age and gender).

2. RELATED WORK

We divide the literature review into four sections: (i) federated recommender systems (FRS), (ii) client sampling in federated learning, (iii) fairness in centralized RSs, and (iv) fair federated learning models. We emphasize that there does not exist any work which targets fairness in FRS. 



)), smartphone voice assistant, mobile edge computing, and visual object detection (Aledhari et al. (2020)). These applications face numerous challenges including communication efficiency (Smith et al. (2018)), statistical heterogeneity (Smith et al. (2017)), systems heterogeneity (Bonawitz et al. (2019)), privacy, personalization, fairness (Kairouz et al. (

)). Many past works (Islam et al. (2019); Yao & Huang (2017); Li et al. (2021); Yang et al. (2020a)) develop bias mitigation strategies in traditional RSs, which require sharing sensitive attributes with the server, causing privacy leakage in the federated setting. In FL framework, Yue et al. (2021); Kanaparthy et al. (2021); Du et al. (2020); Zhang et al. (

Recommender Systems (FRS): Federated Collaborative Filtering (FCF) (ud din et al. (2019)) is the first FRS to use implicit feedback for providing personalized recommendations. Lin et al. (2021) identifies the need for an FRS that uses explicit ratings and proposes FedRec that deploys two techniques-Hybrid Filling (HF) and User Averaging (UA) for privacy preservation. Researchers have been actively exploring many areas of research in FRS like denoising (Liang et al. (2021)), personalization (Jalalirad et al. (2019); Anelli et al. (2021)), privacy enhancement (Wang et al. (2021); Hu et al. (2021); Ali et al. (2021)), building robust FRS (Wu et al. (2022); Zhang et al. (2022); Rong et al. (2022)),mitigating cold-start issue (Wahab et al. (2022)), and improving accuracy of FRS (Perifanis & Efraimidis (2022)). FedFast (Muhammad et al. (2020)) speeds up training in FRS by using an active sampling technique based on the clustering of the user vectors.

