LEARNING FAIR GRAPH REPRESENTATIONS VIA AUTOMATED DATA AUGMENTATIONS

Abstract

We consider fair graph representation learning via data augmentations. While this direction has been explored previously, existing methods invariably rely on certain assumptions on the properties of fair graph data in order to design fixed strategies on data augmentations. Nevertheless, the exact properties of fair graph data may vary significantly in different scenarios. Hence, heuristically designed augmentations may not always generate fair graph data in different application scenarios. In this work, we propose a method, known as Graphair, to learn fair representations based on automated graph data augmentations. Such fairness-aware augmentations are themselves learned from data. Our Graphair is designed to automatically discover fairness-aware augmentations from input graphs in order to circumvent sensitive information while preserving other useful information. Experimental results demonstrate that our Graphair consistently outperforms many baselines on multiple node classification datasets in terms of fairness-accuracy trade-off performance. In addition, results indicate that Graphair can automatically learn to generate fair graph data without prior knowledge on fairness-relevant graph properties.

1. INTRODUCTION

Recently, graph neural networks (GNNs) attract increasing attentions due to their remarkable performance (Gao et al., 2021; Gao & Ji, 2019; Liu et al., 2021a; b; Yuan et al., 2021) in many applications, such as knowledge graphs (Hamaguchi et al., 2017) , molecular property prediction (Liu et al., 2022; 2020; Han et al., 2022a) and social media mining (Hamilton et al., 2017) . Despite recent advances in graph representation learning (Grover & Leskovec, 2016; Kipf & Welling, 2017; 2016; Gilmer et al., 2017; Han et al., 2022b) , these GNN models may inherit or even amplify bias from training data (Dai & Wang, 2021), thereby introducing prediction discrimination against certain groups defined by sensitive attributes, such as race and gender. Such discriminative behavior may lead to serious ethical and societal concerns, thus limiting the applications of GNNs to many real-world high-stake tasks, such as criminal justice (Suresh & Guttag, 2019), job hunting (Mehrabi et al., 2021 ), healthcare (Rajkomar et al., 2018) , and credit scoring (Feldman et al., 2015; Petrasic et al., 2017) . Hence, it is highly desirable to learn fair graph representations without discriminatory biases (Dong et al., 2022; Zhang et al., 2022; Kang et al., 2022; Dai et al., 2022) . A primary issue (Mehrabi et al., 2021; Olteanu et al., 2019) in fairness is that training data usually contain biases, which is the source of discriminative behavior of models. Thereby, many existing works (Agarwal et al., 2021; Kose & Shen, 2022; Spinelli et al., 2021) propose to learn fair graph representations by modifying training data with fairness-aware graph data augmentations. These methods propose some graph data properties that are beneficial to fair representation learning, and then adopt heuristic graph data augmentation operations, including node feature masking and edge perturbation, to refine graph data. However, the proposed graph properties (Spinelli et al., 2021; Kose & Shen, 2022) may not be appropriate for all graph datasets due to the diverse nature of graph data. For example, balanced inter/intra edges (Kose & Shen, 2022) may destroy topology structures of social networks, leading to the loss of important information. Even if the proposed graph properties are effective, the best graph properties may vary significantly in different scenarios. Hence, it is highly desirable to automatically discover dataset-specific fairness-aware augmentation strategies among different datasets with a single framework. To this end, a natural question is raised: Can we achieve fair graph representation learning via automated data augmentations? In this work, we attempt to address this question via proposing Graphair, a novel automated graph augmentation method for fair graph representation learning. A primary challenge is how to achieve fairness and informativeness simultaneously in the augmented data. As we intentionally avoid assuming prior knowledge on what types of graphs are considered fair, we propose to employ an adversary model to predict sensitive attributes from augmented graph data. A fair augmented graph should prevent the adversary model from identifying the sensitive attributes. In addition, we propose to retain useful information from original graphs by using contrastive learning to maximize the agreement between original and augmented graphs. Experimental results demonstrate that Graphair consistently outperforms many baselines on multiple node classification datasets in terms of fairness-accuracy trade-off performance.

2.1. FAIR GRAPH REPRESENTATION LEARNING

In this work, we study the problem of fair graph representation learning. Let G = {A, X, S} be a graph with n nodes. Here, A ∈ {0, 1} n×n is the adjacency matrix, and A ij = 1 if and only if there exists an edge between nodes i and j. X = [x 1 , • • • , x n ] T ∈ R n×d is the node feature matrix, where each x i ∈ R d is the d-dimensional feature vector of node i. S ∈ {0, 1} n is the vector containing sensitive attributes (e.g., gender or race) of nodes that should not be captured by machine learning models to make decisions. Our target is to learn a fair graph representation model f : (A, X) → H ∈ R n×d ′ , and the learned representation H = f (A, X) is fed into a classification model θ : H → Ŷ ∈ {0, 1} n to predict the binary label of nodes in G. Particularly, for an ideal fair model f , the output representation H should result in a prediction Ŷ that satisfies the fairness criteria. In general, there exist several different definitions of fairness criteria, including group fairness (Dwork et al., 2012; Rahmattalabi et al., 2019; Jiang et al., 2022b) , individual fairness (Kang et al., 2020; Dong et al., 2021; Petersen et al., 2021) , and counterfactual fairness (Agarwal et al., 2021; Ma et al., 2022) . In this work, we focus on group fairness, which is defined as P( Ŷi |S i = 0) = P( Ŷi |S i = 1), i = 1, . . . , n, where Ŷi is the prediction for node i, and S i is the sensitive attribute of node i. Note that even though the sets of node attributes or features in X and S are disjoint, correlations may exist between (A, X) and S. Hence, even if S is not explicitly exposed to f , f may implicitly infer parts of S from (A, X) and produce biased representation H, thereby making the prediction Ŷ unfair. How to prevent models from intentionally fitting these correlations is the central problem to be solved in achieving fair graph representation learning. Currently, several studies have proposed different strategies to achieve fair graph representation learning. An early study (Rahman et al., 2019) proposes to train the model through fair random walks. Some recent studies (Li et al., 2020; Laclau et al., 2021) propose to reduce prediction discrimination through optimizing adjacency matrices, which can improve fairness for link prediction tasks. In addition, adversarial learning is another popular strategy to achieve fairness on node representation learning tasks. Many studies (Fisher et al., 2020; Dai & Wang, 2021; Bose & Hamilton, 2019) adopt adversarial learning to filter out sensitive attribute information from the learned node representations. Overall, most existing methods learn fair representations via altering model training strategy with fairness regularization. However, a primary issue in fairness learning lies in the fact that training data usually possess bias. Hence, an alternative and highly desirable solution is to modify data through data augmentations, thus enabling models to learn fair representations easily. In this work, we design a learnable graph augmentation method to reduce bias in graph data, leading to more effective fairness-aware representation learning on graphs.

