ROBUST FAIR CLUSTERING: A NOVEL FAIRNESS ATTACK AND DEFENSE FRAMEWORK

Abstract

Clustering algorithms are widely used in many societal resource allocation applications, such as loan approvals and candidate recruitment, among others, and hence, biased or unfair model outputs can adversely impact individuals that rely on these applications. To this end, many fair clustering approaches have been recently proposed to counteract this issue. Due to the potential for significant harm, it is essential to ensure that fair clustering algorithms provide consistently fair outputs even under adversarial influence. However, fair clustering algorithms have not been studied from an adversarial attack perspective. In contrast to previous research, we seek to bridge this gap and conduct a robustness analysis against fair clustering by proposing a novel black-box fairness attack. Through comprehensive experiments 1 , we find that state-of-the-art models are highly susceptible to our attack as it can reduce their fairness performance significantly. Finally, we propose Consensus Fair Clustering (CFC), the first robust fair clustering approach that transforms consensus clustering into a fair graph partitioning problem, and iteratively learns to generate fair cluster outputs. Experimentally, we observe that CFC is highly robust to the proposed attack and is thus a truly robust fair clustering alternative.

1. INTRODUCTION

Machine learning models are ubiquitously utilized in many applications, including high-stakes domains such as loan disbursement (Tsai & Chen, 2010) , recidivism prediction (Berk et al., 2021; Ferguson, 2014) , hiring and recruitment (Roy et al., 2020; Pombo, 2019) , among others. For this reason, it is of paramount importance to ensure that decisions derived from such predictive models are unbiased and fair for all individuals treated (Mehrabi et al., 2021a) . In particular, this is the main motivation behind group-level fair learning approaches (Celis et al., 2021a; Li & Liu, 2022; Song et al., 2021) , where the goal is to generate predictions that do not disparately impact individuals from minority protected groups (such as ethnicity, sex, etc.). It is also worthwhile to note that this problem is technically challenging because there exists an inherent fairness-performance tradeoff (Dutta et al., 2020) , and thus fairness needs to be improved while ensuring approximate preservation of model predictive performance. This line of research is even more pertinent for data clustering, where error rates cannot be directly assessed using class labels to measure disparate impact. Thus, many approaches have been recently proposed to make clustering models group-level fair (Chierichetti et al., 2017; Backurs et al., 2019; Kleindessner et al., 2019a; Chhabra et al., 2022b) . In a nutshell, these approaches seek to improve fairness of clustering outputs with respect to some fairness metrics, which ensure that each cluster contains approximately the same proportion of samples from each protected group as they appear in the dataset. While many fair clustering approaches have been proposed, it is of the utmost importance to ensure that these models provide fair outputs even in the presence of an adversary seeking to degrade fairness utility. Although there are some pioneering attempts on fairness attacks against supervised learning models (Solans et al., 2020; Mehrabi et al., 2021b) , unfortunately, none of these works propose defense approaches. Moreover, in the unsupervised scenario, fair clustering algorithms have not yet been explored from an adversarial attack perspective, which leaves the whole area of unsupervised fair clustering in potential danger. This leads us to our fundamental research questions in this paper: Are fair clustering algorithms vulnerable to adversarial attacks that seek to decrease fairness utility, and if such attacks exist, how do we develop an adversarially robust fair clustering model? Contributions. In this paper, we answer both these questions in the affirmative by making the following contributions: • We propose a novel black-box adversarial attack against clustering models where the attacker can perturb a small percentage of protected group memberships and yet is able to degrade the fairness performance of state-of-the-art fair clustering models significantly (Section 2). We also discuss how our attack is critically different from existing adversarial attacks against clustering performance and why they cannot be used for the proposed threat model. • Through extensive experiments using our attack approach, we find that existing fair clustering algorithms are not robust to adversarial influence, and are extremely volatile with regards to fairness utility (Section 2.2). We conduct this analysis on a number of real-world datasets, and for a variety of clustering performance and fairness utility metrics. • To achieve truly robust fair clustering, we propose the Consensus Fair Clustering (CFC) model (Section 3) which is highly resilient to the proposed fairness attack. To the best of our knowledge, CFC is the first defense approach for fairness attacks, which makes it an important contribution to the unsupervised ML community. Preliminaries and Notation. Given a tabular dataset X={x i }∈R n×d with n samples and d features, each sample x i is associated with a protected group membership g(x i )∈[L], where L is the total number of protected groups, and we denote group memberships for the entire dataset as G={g(x i )} n i=1 ∈ N n . We also have H = {H 1 , H 2 , ..., H L } and H l is the set of samples that belong to l-th protected group. A clustering algorithm C(X, K) takes as input the dataset X and a parameter K, and outputs labeling where each sample belongs to one of K clusters (Xu & Wunsch, 2005) . That is, each point is clustered in one of the sets {C 1 , C 2 , ..., C K } with ∪ K k=1 C k = X. Based on the above, a group-level fair clustering algorithm F(X, K, G) (Chierichetti et al., 2017) can be defined similarly to C, where F takes as input the protected group membership G along with X and K, and outputs fair labeling that is expected to be more fair than the clustering obtained via the original unfair/vanilla clustering algorithm with respect to a given fairness utility function ϕ. That is, ϕ(F(X, K, G), G) ≤ ϕ(C(X, K), G). Note that ϕ can be defined to be any fairness utility metric, such as Balance and Entropy (Chhabra et al., 2021a; Mehrabi et al., 2021a) .

2. FAIRNESS ATTACK

In this section, we study the attack problem on fair clustering. Specifically, we propose a novel attack that aims to reduce the fairness utility of fair clustering algorithms, as opposed to traditional adversarial attacks that seek to decrease clustering performance Cinà et al. (2022) . To our best knowledge, although there are a few pioneering attempts toward fairness attack (Mehrabi et al., 2021b; Solans et al., 2020) , all of them consider the supervised setting. Our proposed attack exposes a novel problem prevalent with fair clustering approaches that has not been given considerable attention yet-as the protected group memberships are input to the fair clustering optimization problem, they can be used to disrupt the fairness utility. We study attacks under the black-box setting, where the attacker has no knowledge of the fair clustering algorithm being used. Before formulating the problem in detail, we first define the threat model as the adversary and then elaborate on our proposed attack.

2.1. THREAT MODEL TO ATTACK FAIRNESS

Threat Model. Take the customer segmentation (Liu & Zhou, 2017; Nazari & Sheikholeslami, 2021) as an example and assume that the sensitive attribute considered is age with 3 protected groups: {youth, adult, senior}. Now, we can motivate our threat model as follows: the adversary can control a small portion of individuals' protected group memberships (either through social engineering, exploiting a security flaw in the system, etc.); by changing their protected group memberships, the adversary aims to disrupt the fairness utility of the fair algorithm on other uncontrolled groups. That is, there would be an overwhelming majority of some protected group samples over others in clusters.



Code available here: https://github.com/anshuman23/CFC.

