ROBUST FAIR CLUSTERING: A NOVEL FAIRNESS ATTACK AND DEFENSE FRAMEWORK

Abstract

Clustering algorithms are widely used in many societal resource allocation applications, such as loan approvals and candidate recruitment, among others, and hence, biased or unfair model outputs can adversely impact individuals that rely on these applications. To this end, many fair clustering approaches have been recently proposed to counteract this issue. Due to the potential for significant harm, it is essential to ensure that fair clustering algorithms provide consistently fair outputs even under adversarial influence. However, fair clustering algorithms have not been studied from an adversarial attack perspective. In contrast to previous research, we seek to bridge this gap and conduct a robustness analysis against fair clustering by proposing a novel black-box fairness attack. Through comprehensive experiments 1 , we find that state-of-the-art models are highly susceptible to our attack as it can reduce their fairness performance significantly. Finally, we propose Consensus Fair Clustering (CFC), the first robust fair clustering approach that transforms consensus clustering into a fair graph partitioning problem, and iteratively learns to generate fair cluster outputs. Experimentally, we observe that CFC is highly robust to the proposed attack and is thus a truly robust fair clustering alternative.

1. INTRODUCTION

Machine learning models are ubiquitously utilized in many applications, including high-stakes domains such as loan disbursement (Tsai & Chen, 2010 ), recidivism prediction (Berk et al., 2021; Ferguson, 2014), hiring and recruitment (Roy et al., 2020; Pombo, 2019) , among others. For this reason, it is of paramount importance to ensure that decisions derived from such predictive models are unbiased and fair for all individuals treated (Mehrabi et al., 2021a) . In particular, this is the main motivation behind group-level fair learning approaches (Celis et al., 2021a; Li & Liu, 2022; Song et al., 2021) , where the goal is to generate predictions that do not disparately impact individuals from minority protected groups (such as ethnicity, sex, etc.). It is also worthwhile to note that this problem is technically challenging because there exists an inherent fairness-performance tradeoff (Dutta et al., 2020) , and thus fairness needs to be improved while ensuring approximate preservation of model predictive performance. This line of research is even more pertinent for data clustering, where error rates cannot be directly assessed using class labels to measure disparate impact. Thus, many approaches have been recently proposed to make clustering models group-level fair (Chierichetti et al., 2017; Backurs et al., 2019; Kleindessner et al., 2019a; Chhabra et al., 2022b) . In a nutshell, these approaches seek to improve fairness of clustering outputs with respect to some fairness metrics, which ensure that each cluster contains approximately the same proportion of samples from each protected group as they appear in the dataset. While many fair clustering approaches have been proposed, it is of the utmost importance to ensure that these models provide fair outputs even in the presence of an adversary seeking to degrade fairness utility. Although there are some pioneering attempts on fairness attacks against supervised learning models (Solans et al., 2020; Mehrabi et al., 2021b) , unfortunately, none of these works propose defense approaches. Moreover, in the unsupervised scenario, fair clustering algorithms have not yet



Code available here: https://github.com/anshuman23/CFC. 1

