GLASU: A COMMUNICATION-EFFICIENT ALGORITHM FOR FEDERATED LEARNING WITH VERTICALLY DISTRIBUTED GRAPH DATA Anonymous authors Paper under double-blind review

Abstract

Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural networks (GNNs) are competitive machine learning models, but a naive implementation in the VFL setting causes a significant communication overhead. Moreover, the analysis of the training is faced with a challenge caused by the biased stochastic gradients. In this paper, we propose a model splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip aggregation when evaluating the model and skip feature exchanges during training, greatly reducing communication. We offer a theoretical analysis and conduct extensive numerical experiments on real-world datasets, showing that the proposed algorithm effectively trains a GNN model, whose performance matches that of the backbone GNN when trained in a centralized manner.

1. INTRODUCTION

Vertical federated learning (VFL) is a newly developed machine learning scenario in distributed optimization, where clients share data with the same sample identity but each client possesses only a subset of the features for each sample. The goal is for the clients to collaboratively learn a model based on all features. Such a scenario appears in many applications, including healthcare, finance, and recommendation systems (Chen et al., 2020b; Liu et al., 2022) . For example, in healthcare, each hospital may collect partial clinical data of a patient such that their conditions and treatments are best predicted through learning from the data collectively; in finance, banks or e-commerce providers may jointly analyze a customer's credit with their trade histories and personal information; and in recommendation systems, online social/review platforms may collect a user's comments and reviews left at different websites to predict suitable products for the user. Most of the current VFL solutions (Chen et al., 2020b; Liu et al., 2022) treat the case where samples are independent, but omit their relational structure. However, the pairwise relationship between samples emerges in many occasions and it can be crucial in several learning scenarios, including the low-labeling-rate scenario in semi-supervised learning and the no-labeling scenario in selfsupervised learning. Take the financial application as an example: customers and institutions are related through transactions. Such relations can be used to trace finance crimes such as money laundering, to assess the credit risk of a customer, and even to recommend products to them. Each bank and e-commerce provider can infer the relations of the financial individuals registered to them and create a relational graph, in addition to the individual customer information they possess. One of the most effective machine learning models to handle relational data is graph neural networks (GNNs) (Kipf & Welling, 2016; Hamilton et al., 2017; Chen et al., 2018; Velickovic et al., 2018; Chen et al., 2020a) . This model performs neighborhood aggregation in every feature transformation layer, such that the prediction of a graph node is based on not only the information of this node but also that of its neighbors. Although GNNs have been used in federated learning, a majority The setting under our consideration is fundamentally challenging, because fully leveraging features within neighborhoods causes an enormous amount of communication. One method to design and train a GNN is that each client uses a local GNN to extract node representations from its own subgraph and the server aggregates these representations to make predictions (Zhou et al., 2020) . The drawback of this method is that the partial features of a node outside one client's neighborhood are not used, even if this node appears in another client's neighborhood. Another method to train a GNN is to simulate centralized training: transformed features of each node are aggregated by the server, from where neighborhood aggregation is performed (Ni et al., 2021) . This method suffers the communication overhead incurred in each layer computation. In this work, we propose a federated GNN model and a communication-efficient training algorithm, named GLASU, for federated learning with vertically distributed graph data. The model is split across the clients and the server, such that the clients can use a majority of existing GNNs as the backbone, while the server contains no model parameters. The server only aggregates and disseminates computed data with the clients. The communication frequency between the clients and the server is mitigated through lazy aggregation and stale updates (hence the name of the method), with convergence guarantees. Moreover, GLASU can be considered as a framework that encompasses many well-known models and algorithms as special cases, including the work of Liu et al. (2022) when the subgraphs are absent, the work of Zhou et al. ( 2020) when all aggregations but the final one are skipped, the work of Ni et al. (2021) when no aggregations are skipped, and centralized training when only a single client exists. We summarize the main contributions of this work below: • Model design: We propose a flexible, federated GNN architecture that is compatible with a majority of existing GNN models. • Algorithm design: We propose the communication-efficient GLASU algorithm to train the model. Therein, lazy aggregation saves communication for each joint inference round, through skipping some aggregation layers in the GNN; while stale updates further save communication by allowing the clients to use stale global information for multiple local model updates. • Theoretical analysis: We provide theoretical convergence analysis for GLASU by addressing the challenges of biased stochastic gradient estimation caused by neighborhood sampling and correlated update steps caused by using stale global information. • Numerical results: We conduct extensive experiments, together with ablation studies, to demonstrate that GLASU can achieve a comparable performance as the centralized model on multiple datasets and multiple GNN backbones, and that GLASU effectively saves communication.



Figure 1: Data isolation of vertically distributed graph-structured data over three clients.

