TOPOLOGY MATTERS IN FAIR GRAPH LEARNING: A THEORETICAL PILOT STUDY

Abstract

Recent advances in fair graph learning observe that graph neural networks (GNNs) further amplify prediction bias compared with multilayer perception (MLP), while the reason behind this is unknown. In this paper, we conduct a theoretical analysis of the bias amplification mechanism in GNNs. This is a challenging task since GNNs are difficult to be interpreted, and real-world networks are complex. To bridge the gap, we theoretically and experimentally demonstrate that aggregation operation in representative GNNs accumulates bias in node representation due to topology bias induced by graph topology. We provide a sufficient condition identifying the statistical information of graph data, so that graph aggregation enhances prediction bias in GNNs. Motivated by this data-centric finding, we propose a fair graph refinement algorithm, named FairGR, to rewire graph topology to reduce sensitive homophily coefficient while preserving useful graph topology. Experiments on node classification tasks demonstrate that FairGR can mitigate the prediction bias with comparable performance on three real-world datasets. Additionally, FairGR is compatible with many state-of-the-art methods, such as adding regularization, adversarial debiasing, and Fair mixup via refining graph topology. Therefore, FairGR is a plug-in fairness method and can be adapted to improve existing fair graph learning strategies.

1. INTRODUCTION

Graph neural networks (GNNs) (Kipf & Welling, 2017; Veličković et al., 2018; Wu et al., 2019) are widely adopted in various domains, such as social media mining (Hamilton et al., 2017 ), knowledge graph (Hamaguchi et al., 2017 ) and recommender system (Ying et al., 2018) , due to remarkable performance in learning representations. Graph learning, a topic with growing popularity, aims to learn node representation containing both topological and attribute information in a given graph. Despite the outstanding performance in various tasks, GNNs still inherit or even amplify societal bias from input graph data (Dai & Wang, 2021) . The biased node representation largely limits the application of GNNs in many high-stake tasks, such as job hunting (Mehrabi et al., 2021) and crime ratio prediction (Suresh & Guttag, 2019). Hence, bias mitigation that facilitates the research on fair GNNs is in urgent need. In many real-world graphs, nodes with the same sensitive attribute (e.g., ages) are more likely to connect. For example, young people mainly make friends with people of similar ages (Dong et al., 2016) . We call this phenomenon "topology bias". Even worse, in GNNs, the representation of each node is learned by aggregating the representations of its neighbors. Thus, nodes with the same sensitive attributes will be more similar after the aggregation. To get a sense, we visualize the topology bias for three real-world datasets (Pokec-n, Pokec-z, and NBA) in Figure 1 , where different edge types are highlighted with different colors for the top-3 largest connected components in original graphs. Such topology bias leads to more similar node representation for those nodes with the same sensitive attribute, which is a major source of the graph representation bias. Existing bias mitigation work for GNNs is empirical via adding regularization, adversarial debiasing, or contrastive learning. These works are motivated by the fact that graph neural networks trained on graphs may inherit the societal bias in data, and the topology of graphs and the message passing in GNNs could even magnify the bias compared with multilayer perception (MLP) (Dai & Wang, 2021) . However, even though fair prediction in GNN can be achieved via a fair training strategy, a

