TOWARDS FEDERATED LEARNING OF DEEP GRAPH NEURAL NETWORKS

Abstract

Graph neural networks (GNNs) learn node representations by recursively aggregating neighborhood information on graph data. However, in the federated setting, data samples (nodes) located in different clients may be connected to each other, leading to huge information loss to the training method. Existing federated graph learning frameworks solve such a problem by generating missing neighbors or sending information across clients directly. None are suitable for training deep GNNs, which require a more expansive receptive field and higher communication costs. In this work, we introduce a novel framework named Fed 2 GNN for federated graph learning of deep GNNs via reconstructing neighborhood information of nodes. Specifically, we design a graph structure named rooted tree. The node embedding obtained by encoding on the rooted tree is the same as that obtained by encoding on the induced subgraph surrounding the node, which allows us to reconstruct the neighborhood information by building the rooted tree of the node. An encoder-decoder framework is then proposed, wherein we first encode missing neighbor information and then decode it to build the rooted tree. Extensive experiments on real-world network datasets show the effectiveness of our framework for training deep GNNs while also achieving better performance for training shadow GNN models 1 .

1. INTRODUCTION

Recently, Graph Neural Networks (GNNs) have attracted significant attention due to their powerful ability for representation learning of graph-structured data (Hamilton et al., 2017a; Kipf & Welling, 2017; Hamilton et al., 2017b) . Generally speaking, it adopts a recursive neighborhood aggregation (or message passing) scheme to learn node representations by considering the node features and graph topology information together (Xu et al., 2018) . After k iterations of aggregation, a node captures the information within the node's k-hop neighborhood. Similar to learning tasks of other domains, training a well-performed GNN model requires its training data to be not only sufficiently large but also heterogeneous for better generalization of the model. However, in reality, heterogeneous data are often separately stored in different clients and cannot be shared due to policies and privacy concerns. To that end, recent works have proposed federated training of GNNs (Zhang et al., 2021; Peng et al., 2021; Yao & Joe-Wong, 2022; Chen et al., 2022) . They typically consider a framework wherein each client iteratively updates node representations with a semi-supervised model on its local graph; the models are then aggregated at a central server. The main challenge is that data samples (nodes) located in different clients may be connected to each other. Hence, it is non-trivial to consider the connected nodes (i.e., neighbor nodes) located in other clients when applying node updates. Although existing works focus on recovering missing neighborhood information for nodes, they either only consider immediate neighbors (Zhang et al., 2021; Peng et al., 2021) or require communication costs to increase exponentially as the neighbors' distance increases (Yao & Joe-Wong, 2022; Chen et al., 2022) . None of them are suitable for training deeper GNN models, which require a more expansive receptive field and have been shown to be beneficial for representation learning for graph-structured data (Li et al., 2019; Liu et al., 2020; Zhou et al., 2020a) . For GNNs, the receptive field of a node representation is its entire neighborhood. Moreover, (Yao & Joe-Wong, 2022) also requires calculating the weighted matrix in advance, which is not available in practice. In this work, we aim to fundamentally address the above limitations of existing federated graph learning methods by proposing a novel framework named Fed 2 GNN. The key idea lies in designing a principled approach to reconstructing the neighborhood information of nodes that considers both structure-based (i.e., graph topology) and feature-based information (i.e., node features). For the structure-based information, we propose a novel graph structure named rooted tree, which has a more regular structure than the original structure of the node neighborhood. More importantly, the node embedding obtained by encoding on the rooted tree is the same as that obtained by encoding on the node's ego-graph (i.e., the induced subgraph surrounding the node). Such a property allows us to easily reconstruct the structure-based information by building the rooted tree of the node. For the feature-based information, since the structure of the node neighborhood changes, we aim to generate features of the nodes in the rooted tree. Inspired by the structure of the rooted tree, we design a protocol wherein clients recursively transmit information across each other. The data transmitted in the k-th round correspond to nodes in the k+1-th layer of the rooted tree. Furthermore, we utilize the encoder-decoder framework to reduce the communication costs such that it grows only linearly as the number of iterations increases. In more detail, each client first encodes the information and sends the output to other clients. Other clients build the rooted tree by decoding the received information. By merging all trees into the local graph (with the rooted node as an anchor), each client obtains a complete graph on which applying graph representation learning has limited information loss. In summary, we make the following contributions: • We introduce Fed 2 GNN, an framework for federated training of GNNs to solve node-level prediction tasks. We achieve such a goal by devising a principled approach to reconstructing missing neighborhood information that considers both structure-based and feature-based information. • To reconstruct the structure-based information, we propose a novel graph structure named rooted tree, which is easier to construct than the original irregular structures of the node neighborhood. More importantly, the node embedding obtained by encoding on the rooted tree is the same as that obtained by encoding on the node's ego-graph. • To reconstruct the feature-based information, we propose an encoder-decoder framework to reduce communication costs while having limited information loss. • We conduct extensive experiments to verify the utility of Fed 2 GNN. The results show that it is effective for training deep GNNs while achieving better performance for training shadow GNN models. We outline related works in Section 2 before introducing the problem statement of federated graph learning in Section 3. We then introduce Fed 2 GNN in Section 4, wherein we first introduce the structure of the rooted tree and then presents the neighborhood reconstruction process. We analyze its performance experimentally in Section 5 and concluding in Section 6.

2. RELATED WORKS

Graph Neural Networks (GNNs) learn a representation for each node in the graph using a set of stacked graph convolution layers. Each layer gets an initial vector for each node and outputs a new embedding vector by aggregating vectors of neighbor nodes followed by a non-linear transform. After k aggregations, the source of information encoded in the representation of a node essentially comes from its k-hop neighborhood. Following the above framework, which is usually called message passing, several GNN models have been proposed, such as GCN (Kipf & Welling, 2017), GraphSAGE (Hamilton et al., 2017b) , GAT (Velickovic et al., 2018) , and so on. However, unlike the learning tasks in other domains, simply stacking graph convolution layers usually suffer from an over-smoothing issue, leading to even worse performance. Surprisingly, with the research on the above issues, several works (Liu et al., 2020; Li et al., 2019; Zhou et al., 2020b) propose effective deep GNNs and obtain better performance on graph learning tasks. Its excellent performance suggests its great potential for federated learning on distributed subgraph data.



Code available at https://www.dropbox.com/s/unizcyixsmip0je/Fed%5E2GNN.zip? dl=0

