INDIVIDUAL FAIRNESS OF DATA PROVIDER REGARDING PRIVACY RISK AND GAIN Anonymous authors Paper under double-blind review

Abstract

Fairness and privacy risks are important concerns of machine learning (ML) when deploying ML to the real world. Recent studies have focused on group fairness and privacy protection, but no study focuses on individual fairness (IF) and privacy protection. In this paper, we propose a new definition of IF from the perspective of privacy protection and experimentally evaluate privacy-preserving ML based on the proposed IF. For the proposed definition, we assume that users provide their data to an ML service and consider the principle that all users should obtain gains corresponding to their privacy risks. As a user's gain, we calculate the accuracy improvement on the user's data when providing the data to the ML service. We conducted experiments on the image and tabular datasets using three neural networks (NNs) and two tree-based algorithms with differential privacy guarantee. The experimental results of NNs show that we cannot stably improve the proposed IF by changing the strength of privacy protection and applying defenses against membership inference attacks. The results of tree-based algorithms show that privacy risks were extremely small without depending on the strength of privacy protection but raise a new question about the motivation of users for providing their data.

1. INTRODUCTION

As machine learning (ML) services trained with users' data become increasingly popular, privacy risks of memorizing training data have been gaining attention (Shokri et al., 2017; Jagielski et al., 2020; Nasr et al., 2021; Malek Esmaeili et al., 2021) . To prevent privacy leakage through trained models, privacy-preserving ML based on differential privacy (DP) (Dwork et al., 2006 ) is a de facto standard. For example, DP-SGD (Song et al., 2013; Abadi et al., 2016) is used for training neural networks (NNs) based on stochastic gradient descend (SGD) with DP guarantee, and DPBoost (Li et al., 2020) and DPXGBoost (Grislain & Gonzalvez, 2021) are used for training tree-based models with DP guarantee. When applying ML to the real world, fairness is another important concern about ML. Recent studies have begun to focus on both privacy protection and fairness: the difference in the effect of DP on majority and minority groups (Bagdasaryan et al., 2019; Pujol et al., 2020; Farrand et al., 2020; Tran et al., 2021) , the difference in vulnerabilities against membership inference attacks (MIAs) between majority and minority groups (Zhang et al., 2020; Zhong et al., 2022) , and methods for guaranteeing both group fairness and DP (Xu et al., 2019; 2020) . All of these studies have focused on group fairness, i.e., fairness between majority and minority groups. Assuming situations where users decide whether to provide their data to ML services, individual fairness (IF), i.e., fairness between individual users, is also important for the decision. However, no study has focused on IF and privacy protection. In this paper, we investigate privacy-preserving ML from the perspective of both IF and privacy protection. To this end, we propose a new definition of IF from the perspective of privacy protection and experimentally evaluate privacy-preserving ML based on the proposed IF. Assuming that users provide their data to an ML service, we define the proposed IF based on the principle that all users should obtain gains corresponding to their privacy risks. Furthermore, we discuss the relationship between the proposed IF and prior IF for classification and validate the proposed IF using synthetic data.

