PERSONALIZED SUBGRAPH FEDERATED LEARNING

Abstract

In real-world scenarios, subgraphs of a larger global graph may be distributed across multiple devices or institutions, and only locally accessible due to privacy restrictions, although there may be links between them. Recently proposed subgraph Federated Learning (FL) methods deal with those missing links across private local subgraphs while distributively training Graph Neural Networks (GNNs) on them. However, they have overlooked the inevitable heterogeneity among subgraphs, caused by subgraphs comprising different communities of a global graph, therefore, consequently collapsing the incompatible knowledge from local GNN models trained on heterogeneous graph distributions. To overcome such a limitation, we introduce a new subgraph FL problem, personalized subgraph FL, which focuses on the joint improvement of the interrelated local GNN models rather than learning a single global GNN model, and propose a novel framework, FEDerated Personalized sUBgraph learning (FED-PUB), to tackle it. A crucial challenge in personalized subgraph FL is that the server does not know which subgraph each client has. FED-PUB thus utilizes functional embeddings of the local GNNs using random graphs as inputs to compute similarities between them, and use them to perform weighted averaging for server-side aggregation. Further, it learns a personalized sparse mask at each client to select and update only the subgraph-relevant subset of the aggregated parameters. We validate FED-PUB for its subgraph FL performance on six datasets, considering both non-overlapping and overlapping subgraphs, on which ours largely outperforms relevant baselines.

1. INTRODUCTION

Most of the previous Graph Neural Networks (GNNs) (Hamilton, 2020) focus on a single graph, whose nodes and edges collected from multiple sources are stored in a central server. For instance, in a social network platform, every user, with his/her social networks, contributes to creating a giant network consisting of all users and their connections. However, in some practical scenarios, each user/institution collects its own private graph, which is only locally accessible due to privacy restrictions. For instance, as described in Zhang et al. (2021) , each hospital may have its own patient interaction network to track their physical contacts or co-diagnosis of a disease, however, such a graph may not be shared with others. How can we then collaboratively train, without sharing actual data, a neural network with its subgraphs distributed across multiple participants (i.e., clients)? The most straightforward way is to perform Federated Learning (FL) with GNNs. Specifically, each client will individually train a local GNN on the private local data, while a central server aggregates locally updated GNN weights from multiple clients into one, and then transmits it back to the clients. However, an important challenge for such the subgraph FL scenario is how to deal with potentially missing edges between subgraphs that are not captured by individual data owners, but may carry important information (See Figure 1 (A) ). Recent subgraph FL methods (Wu et al., 2021a; Zhang et al., 2021) additionally tackle this problem by expanding the local subgraph from other subgraphs, as illustrated in Figure 1 (B) . In particular, they expand the local subgraph either by exactly augmenting the relevant nodes from the other subgraphs at the other clients (Wu et al., 2021a) , or by estimating the nodes using the node information in the other subgraphs (Zhang et al., 2021) . However, such sharing of node information may compromise data privacy and can incur high communication costs. Also, there exists a more important challenge that has been overlooked by existing subgraph FL. We observe that they suffer from large performance degeneration (See Figure 1 right), due to the heterogeneity among subgraphs, which is natural since subgraphs comprise different parts of a global

