A NEW PARADIGM FOR FEDERATED STRUCTURE NON-IID SUBGRAPH LEARNING Anonymous authors Paper under double-blind review

Abstract

Federated graph learning (FGL), a distributed training framework for graph neural networks (GNNs) has attracted much attention for breaking the centralized machine learning assumptions. Despite its effectiveness, the differences in data collection perspectives and quality lead to the challenges of heterogeneity, especially the domain-specific graph is partitioned into subgraphs in different institutions. However, existing FGL methods implement graph data augmentation or personalization with community split which follows the cluster homogeneity assumptions. Hence we investigate the above issues and suggest that subgraph heterogeneity is essentially the structure variations. From the observations on FGL, we first define the structure non-independent identical distribution (Non-IID) problem, which presents unique challenges among client-wise subgraphs. Meanwhile, we propose a new paradigm for general federated data settings called Adaptive Federated Graph Learning (AdaFGL). The motivation behind it is to implement adaptive propagation mechanisms based on federated global knowledge and non-params label propagation. We conduct extensive experiments with community split and structure Non-IID settings, our approach achieves state-of-the-art performance on five benchmark datasets.

1. INTRODUCTION

The graph as a relational data structure is widely used to model real-world entity relations such as citation networks Yang et al. (2016a) Notably, graph heterogeneity is different from the heterogeneity of labels or features in the fields of computer vision or natural language processing, we suggest that it depends on the graph structure. However, The existing FGL methods simulate the federated subgraph distributions through community split, which follows the cluster homogeneity assumption as shown in Fig. 1(a) . Specifically, community split leads to the subgraph structure being consistent and the same as the original graph, e.g., connected nodes are more likely to have the same labels. Obviously, it is overly desirable and hard to satisfy in reality, hence we consider a more reasonable setting shown in Fig. 1(c ). We first refer to the above problem as structure non-independent identical distribution (Non-IID). The motivation behind it is due to graph structure directly related to node labels and feature distributions. Meanwhile, the challenges of structure heterogeneity are ubiquitous in the real world Zheng et al. (2022b) . For instance, in citation networks, we consider research teams focused on computers and intersectional fields (e.g., AI in Science) Shlomi et al. (2021); Gaudelet et al. (2021) as clients. In online transaction networks, fraudsters are more likely to build connections with customers instead



, recommended systems Wu et al. (2022), drug discovery Gaudelet et al. (2021), particle physics Shlomi et al. (2021), etc. However, due to the collection agents and privacy concerns, generally, the global domain-specific graph consists of many subgraphs collected by multiple institutions. In order to analyze the local subgraph, each client maintains a powerful graph mining model such as graph neural networks (GNNs), which have achieved stateof-the-art performance in many graph learning tasks Zhang et al. (2022b); Hu et al. (2021); Zhang & Chen (2018). Despite its effectiveness, the limited data provide sub-optimal performance in most cases. Motivated by the success of federated learning (FL), a natural idea is to combine the GNNs with FL to utilize the distributed subgraphs. Recently, federated graph learning (FGL) He et al. (2021); Wang et al. (2022b) is proposed to achieve collaborative training without directly sharing data, yet an essential concern is the heterogeneity of the distributed subgraphs.

