LEARNING FAIR GRAPH REPRESENTATIONS VIA AUTOMATED DATA AUGMENTATIONS

Abstract

We consider fair graph representation learning via data augmentations. While this direction has been explored previously, existing methods invariably rely on certain assumptions on the properties of fair graph data in order to design fixed strategies on data augmentations. Nevertheless, the exact properties of fair graph data may vary significantly in different scenarios. Hence, heuristically designed augmentations may not always generate fair graph data in different application scenarios. In this work, we propose a method, known as Graphair, to learn fair representations based on automated graph data augmentations. Such fairness-aware augmentations are themselves learned from data. Our Graphair is designed to automatically discover fairness-aware augmentations from input graphs in order to circumvent sensitive information while preserving other useful information. Experimental results demonstrate that our Graphair consistently outperforms many baselines on multiple node classification datasets in terms of fairness-accuracy trade-off performance. In addition, results indicate that Graphair can automatically learn to generate fair graph data without prior knowledge on fairness-relevant graph properties.

1. INTRODUCTION

Recently, graph neural networks (GNNs) attract increasing attentions due to their remarkable performance (Gao et al., 2021; Gao & Ji, 2019; Liu et al., 2021a; b; Yuan et al., 2021) in many applications, such as knowledge graphs (Hamaguchi et al., 2017) , molecular property prediction (Liu et al., 2022; 2020; Han et al., 2022a) and social media mining (Hamilton et al., 2017) . Despite recent advances in graph representation learning (Grover & Leskovec, 2016; Kipf & Welling, 2017; 2016; Gilmer et al., 2017; Han et al., 2022b) , these GNN models may inherit or even amplify bias from training data (Dai & Wang, 2021), thereby introducing prediction discrimination against certain groups defined by sensitive attributes, such as race and gender. Such discriminative behavior may lead to serious ethical and societal concerns, thus limiting the applications of GNNs to many real-world high-stake tasks, such as criminal justice (Suresh & Guttag, 2019), job hunting (Mehrabi et al., 2021) , healthcare (Rajkomar et al., 2018) , and credit scoring (Feldman et al., 2015; Petrasic et al., 2017) . Hence, it is highly desirable to learn fair graph representations without discriminatory biases (Dong et al., 2022; Zhang et al., 2022; Kang et al., 2022; Dai et al., 2022) . A primary issue (Mehrabi et al., 2021; Olteanu et al., 2019) in fairness is that training data usually contain biases, which is the source of discriminative behavior of models. Thereby, many existing works (Agarwal et al., 2021; Kose & Shen, 2022; Spinelli et al., 2021) propose to learn fair graph representations by modifying training data with fairness-aware graph data augmentations. These methods propose some graph data properties that are beneficial to fair representation learning, and then adopt heuristic graph data augmentation operations, including node feature masking and edge perturbation, to refine graph data. However, the proposed graph properties (Spinelli et al., 2021; Kose & Shen, 2022) may not be appropriate for all graph datasets due to the diverse nature of graph data. For example, balanced inter/intra edges (Kose & Shen, 2022) may destroy topology

