ON DYADIC FAIRNESS: EXPLORING AND MITIGATING BIAS IN GRAPH CONNECTIONS

Abstract

Disparate impact has raised serious concerns in machine learning applications and its societal impacts. In response to the need of mitigating discrimination, fairness has been regarded as a crucial property in algorithmic designs. In this work, we study the problem of disparate impact on graph-structured data. Specifically, we focus on dyadic fairness, which articulates a fairness concept that a predictive relationship between two instances should be independent of the sensitive attributes. Based on this, we theoretically relate the graph connections to dyadic fairness on link predictive scores in learning graph neural networks, and reveal that regulating weights on existing edges in a graph contributes to dyadic fairness conditionally. Subsequently, we propose our algorithm, FairAdj, to empirically learn a fair adjacency matrix with proper graph structural constraints for fair link prediction, and in the meanwhile preserve predictive accuracy as much as possible. Empirical validation demonstrates that our method delivers effective dyadic fairness in terms of various statistics, and at the same time enjoys a favorable fairness-utility tradeoff.

1. INTRODUCTION

The scale of graph-structured data has grown explosively across disciplines (e.g., social networks, telecommunication networks, and citation networks), calling for robust computational techniques to model, discover, and extract complex structural patterns hidden in big graph data. Research work has been proposed for inference learning on potential connections (Liben-Nowell & Kleinberg, 2007) , and corresponding algorithms can be used for high-quality link prediction and recommendations (Adamic & Adar, 2003; Sarwar et al., 2001; Qi et al., 2006) . In this work, we study the potential disparate impact in the prediction of dyadic relationships between two instances within a homogeneous graph. Despite the wide applications of link prediction algorithms, serious concerns raised by disparate impact (Angwin et al., 2016; Barocas & Selbst, 2016; Bose & Hamilton, 2019a; Liao et al., 2020) should also be reckoned with by algorithm designers. In an algorithmic context, disparate impact often describes the disparity in influential decisions which essentially derives from the characteristics protected by anti-discrimination laws or social norms. Unfortunately, this negative impact derived from biased data and conventional algorithms occurs in many applications including link prediction. One example is that a user recommender system follows the proximity principle (individuals are more likely to interact with similar individuals) or existing connections with intrinsic bias. Such an operating mode would deliver biased recommendations dominated by sensitive attributes. For example, users with the same religion or ethnic group are more likely to be recommended to a user, and consequently generate segregation in social relations by long-term accumulation (Hofstra et al., 2017) . Another example can be noticed in news streaming. When a news app has collected the political profile from a user, in pursuit of the user preference in news streaming, the system might only deliver politicking that the user is predisposed to agree with, therefore skews a user's scope and narrows the view by selectively displaying reality (Pariser, 2011) . To alleviate these concerns, an algorithm should perform a link prediction without being biased by the sensitive attribute of the two instances, and should also stream diverse and preferred recommendations. Motivated by the potential bias in real cases, in this paper we propose dyadic fairness for the link prediction problem in homogeneous graphs, where the dyadic fairness criterion expects the predictions to be statistically independent of the sensitive attributes from the given two vertices. We focus our scope on Graph Neural Networks (GNNs), which have already shown remarkable capacity in graph representation learning by message passing along the graph structure (Xu et al., 2018; 2020; Ying et al., 2018; Wang et al., 2019; Fan et al., 2019; Li et al., 2020) . Within the pipeline of GNNs, given an arbitrary graph, we theoretically analyze the relationship between dyadic fairness and the graph connections. Our findings suggest adapting weights on existing edges in a graph can contribute to dyadic fairness conditionally. Continuing with our theoretical findings, we propose FairAdj, an algorithm to empirically learn a fair adjacency matrix by updating the normalized adjacency matrix while keeping the original graph structure unchanged. Integrating with a utility objective function, the proposed algorithm seeks supplied dyadic fairness and link predictive utility simultaneously. Our definition of dyadic fairness in a graph context is inspired by the statistical metrics in group fairness (Dwork et al., 2012; Kusner et al., 2017) . First, vertices in a graph are categorized into several groups according to a protected attribute. Then, the dyadic fairness criterion asks some standard statistics such as positive outcomes or false positive rate on link score to be approximately equalized across intra and inter groups. Essentially, such a requirement asks for a more diverse prediction between and within different groups defined by the protected attribute, hence it also allows to mitigate social segregation by asking for more interactions across different protected groups in the graph. Empirically, we present studies on six real-world social and citation networks to demonstrate the effectiveness of the proposed method. We conduct evaluations towards seven measurements of both utility and dyadic fairness. Comparing to other baseline methods (Kipf & Welling, 2016b; Grover & Leskovec, 2016; Rahman et al., 2019; Bose & Hamilton, 2019b) , we consistently observe improvements from two aspects. First, dyadic fairness metrics verify that our method can minimize the statistical gap between the predictions of intra and inter links. Second, in terms of utility, our results are consistent with the existing literature (Zhao & Gordon, 2019; Fish et al., 2016; Calders et al., 2009) , that satisfying fairness can potentially lead to a decrease in utility. However, our algorithm enjoys a more favorable fairness-utility tradeoff (same in fairness but less sacrifice in utility, and vice versa) when compared to previous works. Additionally, to approach the real application cases, we also showcase a direct product that comes from dyadic fairness: our method can effectively stream more diverse recommendations containing instances holding different kinds of sensitive attributes.

2. RELATED WORK

In this section we mainly review some closely related work in both fair machine learning and graph representation learning. We also briefly describe and discuss several existing works on learning fair node representations. Fair Machine Learning. Various types of fairness notions have been proposed and studied, including group fairness (Kusner et al., 2017; Kearns et al., 2018; 2019) , individual fairness (Dwork et al., 2012), and preference-based notions (Zafar et al., 2017a; Ustun et al., 2019) . Embracing these definitions, relevant algorithms involving fair constraints have been proposed. Zemel et al. (2013) propose a method to find a good representation to maximize utility while preserving both group and individual fairness. Following works on fair representation learning use autoencoder (Madras et al., 2018) or adversarial training (Zhao & Gordon, 2019; Zhao et al., 2019; Edwards & Storkey, 2015; Louizos et al., 2016) to simultaneously remove the sensitive patterns while preserving enough information for prediction. Zafar et al. (2017b) optimize for decision boundary fairness through regularization in logistic regression and support vector machines, and some other works achieve fairness by optimal transport between sensitive groups (Gordaliza et al., 2019; Jiang et al., 2019) and fair kernel methods (Donini et al., 2018) . However, most proposed learning algorithms for fairness are mainly built on independent and identically distributed data, which are not suitable to be directly applied to graph-structured data with dyadic fairness. Graph Representation Learning. Representation learning on graphs is formulated to convert a structural graph into a low-dimensional space while preserving the discriminative and structural representations. Efficient graph analytic methods (Von Luxburg, 2007; Tang et al., 2015; Perozzi et al., 2014; Grover & Leskovec, 2016; Xu et al., 2019) can benefit a series of downstream applications including node classification (Wang et al., 2017 ), node clustering (Nie et al., 2017) , link

