FAIR ATTRIBUTE COMPLETION ON GRAPH WITH MISSING ATTRIBUTES

Abstract

Tackling unfairness in graph learning models is a challenging task, as the unfairness issues on graphs involve both attributes and topological structures. Existing work on fair graph learning simply assumes that attributes of all nodes are available for model training and then makes fair predictions. In practice, however, the attributes of some nodes might not be accessible due to missing data or privacy concerns, which makes fair graph learning even more challenging. In this paper, we propose FairAC, a fair attribute completion method, to complement missing information and learn fair node embeddings for graphs with missing attributes. FairAC adopts an attention mechanism to deal with the attribute missing problem and meanwhile, it mitigates two types of unfairness, i.e., feature unfairness from attributes and topological unfairness due to attribute completion. FairAC can work on various types of homogeneous graphs and generate fair embeddings for them and thus can be applied to most downstream tasks to improve their fairness performance. To our best knowledge, FairAC is the first method that jointly addresses the graph attribution completion and graph unfairness problems. Experimental results on benchmark datasets show that our method achieves better fairness performance with less sacrifice in accuracy, compared with the state-of-the-art methods of fair graph learning.

1. INTRODUCTION

Graphs, such as social networks, biomedical networks, and traffic networks, are commonly observed in many real-world applications. A lot of graph-based machine learning methods have been proposed in the past decades, and they have shown promising performance in tasks like node similarity measurement, node classification, graph regression, and community detection. In recent years, graph neural networks (GNNs) have been actively studied (Scarselli et al., 2008; Wu et al., 2020; Jiang et al., 2019; 2020; Zhu et al., 2021c; b; a; Hua et al., 2020; Chu et al., 2021) , which can model graphs with high-dimensional attributes in the non-Euclidean space and have achieved great success in many areas such as recommender systems (Sheu et al., 2021) . However, it has been observed that many graphs are biased, and thus GNNs trained on the biased graphs may be unfair with respect to certain sensitive attributes such as demographic groups. For example, in a social network, if the users with the same gender have more active connections, the GNNs tend to pay more attention to such gender information and lead to gender bias by recommending more friends to a user with the same gender identity while ignoring other attributes like interests. And from the data privacy perspective, it is possible to infer one's sensitive information from the results given by GNNs (Sun et al., 2018) . In a time when GNNs are widely deployed in the real world, this severe unfairness is unacceptable. Thus, fairness in graph learning emerges and becomes notable very recently. Existing work on fair graph learning mainly focuses on the pre-processing, in-processing, and postprocessing steps in the graph learning pipeline in order to mitigate the unfairness issues. The preprocessing approaches modify the original data to conceal sensitive attributes. Fairwalk (Rahman et al., 2019 ) is a representative pre-processing method, which enforces each group of neighboring nodes an equal chance to be chosen in the sampling process. In many in-processing methods, the most popular way is to add a sensitive discriminator as a constraint, in order to filter out sensitive information from original data. For example, FairGNN (Dai & Wang, 2021) adopts a sensitive classifier to filter node embeddings. CFC (Bose & Hamilton, 2019) directly adds a filter layer to deal with unfairness issues. The post-processing methods directly force the final prediction to satisfy fairness constraints, such as (Hardt et al., 2016) . When the graphs have complete node attributes, existing fair graph learning methods could obtain promising performance on both fairness and accuracy. However, in practice, graphs may contain nodes whose attributes are entirely missing due to various reasons (e.g., newly added nodes, and data privacy concerns). Taking social networks as an example, a newly registered user may have incomplete profiles. Given such incomplete graphs, existing fair graph learning methods would fail, as they assume all the nodes have attributes for model training. Although FairGNN (Dai & Wang, 2021 ) also involves the missing attribute problem, it only assumes that a part of the sensitive attributes are missing. To the best of our knowledge, addressing the unfairness issue on graphs with some nodes whose attributes are entirely missing has not been investigated before. Another relevant topic is graph attribute completion (Jin et al., 2021; Chen et al., 2020) . It mainly focuses on completing a precise graph but ignores the unfairness issues. In this work, we aim to jointly complete a graph with missing attributes and mitigate unfairness at both feature and topology levels. In this paper, we study the new problem of learning fair embeddings for graphs with missing attributes. Specifically, we aim to address two major challenges: (1) how to obtain meaningful node embeddings for graphs with missing attributes, and (2) how to enhance fairness of node embeddings with respect to sensitive attributes. To address these two challenges, we propose a Fair Attribute Completion (FairAC) framework. For the first challenge, we adopt an autoencoder to obtain feature embeddings for nodes with attributes and meanwhile we adopt an attention mechanism to aggregate feature information of nodes with missing attributes from their direct neighbors. Then, we address the second challenge by mitigating two types of unfairness, i.e., feature unfairness and topological unfairness. We adopt a sensitive discriminator to regulate embeddings and create a bias-free graph. The main contributions of this paper are as follows: (1) We present a new problem of achieving fairness on a graph with missing attributes. Different from the existing work, we assume that the attributes of some nodes are entirely missing. (2) We propose a new framework, FairAC, for fair graph attribute completion, which jointly addresses unfairness issues from the feature and topology perspectives. (3) FairAC is a generic approach to complete fair graph attributes, and thus can be used in many graph-based downstream tasks. (4) Extensive experiments on benchmark datasets demonstrate the effectiveness of FairAC in eliminating unfairness and maintaining comparable accuracy.

2.1. FAIRNESS IN GRAPH LEARNING

Recent work promotes fairness in graph-based machine learning (Bose & Hamilton, 2019; Rahman et al., 2019; Dai & Wang, 2021; Wang et al., 2022) . They can be roughly divided into three categories, i.e., the pre-processing methods, in-processing methods, and post-processing methods. The pre-processing methods are applied before training downstream tasks by modifying training data. For instance, Fairwalk (Rahman et al., 2019) improves the sampling procedure of node2vec (Grover & Leskovec, 2016) . Our FairAC framework can be viewed as a pre-processing method, as it seeks to complete node attributes and use them as input of graph neural networks. However, our problem is much harder than existing problems, because the attributes of some nodes in the graph are entirely missing, including both the sensitive ones and non-sensitive ones. Given an input graph with missing attributes, FairAC generates fair and complete feature embeddings and thus can be applied to many downstream tasks, such as node classification, link prediction (Liben-Nowell & Kleinberg, 2007; Taskar et al., 2003 ), PageRank (Haveliwala, 2003) , etc. Graph learning models trained on the refined feature embeddings would make fair predictions in downstream tasks. There are plenty of fair graph learning methods as in-processing solutions. Some work focus on dealing with unfairness issues on graphs with complete features. For example, GEAR (Ma et al., 2022) mitigates graph unfairness by counterfactual graph augmentation and an adversarial learning method to learn sensitive-invariant embeddings. However, in order to generate counterfactual subgraphs, they need precise and entire features for every node. In other words, it cannot work well if it encounters a graph with full missing nodes since it cannot generate counterfactual subgraph based

availability

https://github.com/donglgcn/FairAC.

