GU A R DHFL: PRIVACY GUARDIAN FOR HETEROGE-NEOUS FEDERATED LEARNING

Abstract

Heterogeneous federated learning (HFL) enables clients with different computation and communication capabilities to collaboratively train their own customized models via a query-response paradigm on auxiliary datasets. However, such paradigm raises serious privacy issues due to the leakage of highly sensitive query samples and response predictions. Although existing secure querying solutions may be extended to enhance the privacy of HFL with non-trivial adaptation, they suffer from two key limitations: (1) lacking customized protocol designs and (2) relying on heavy cryptographic primitives, which could lead to poor performance. In this work, we put forth GuardHFL, the first-of-its-kind efficient and privacypreserving HFL framework. GuardHFL is equipped with a novel HFL-friendly secure querying scheme that is built on lightweight secret sharing and symmetrickey techniques. Its core is a set of customized multiplication and comparison protocols, which substantially boost the execution efficiency. Extensive evaluations demonstrate that GuardHFL outperforms the state-of-the-art works in both runtime and communication overhead.

1. INTRODUCTION

As a promising variant of federated learning (FL), heterogeneous federated learning (HFL) (Li & Wang, 2019) enables clients equipped with different computation and communication capabilities to collaboratively train their own customized models that may differ in size, numerical precision or structure (Lin et al., 2020) . In particular, the knowledge of models is shared via a query-response paradigm on auxiliary datasets, such as unlabeled datasets from the same task domain (Choquette-Choo et al., 2021) or related datasets from different task domains (Li & Wang, 2019; Lin et al., 2020) . In such a paradigm, each client queries others with samples in the auxiliary querying dataset, and obtains aggregated response predictions via a centralized cloud serverfoot_0 . Then he retrains his local model on the query data and corresponding predictions. This flexible approach facilitates customized FL-driven services in areas like healthcare and finance (Kairouz et al., 2019) , while resolving the intellectual property concerns of FL models (Tekgul et al., 2021) . However, HFL suffers from several privacy issues. First, directly sharing query samples violates their privacy. For example, in healthcare applications, the auxiliary datasets may contain patients' medical conditions. Disclosure of such information is illegal under current regulations like General Data Protection Regulation. Second, sharing response predictions may still compromise the privacy of local data (Papernot et al., 2016) . Several works have shown that given black-box access to a model, adversaries can infer the membership (Salem et al., 2019) and attribute information (Ganju et al., 2018) of the target sample or even reconstruct the original training data (Yang et al., 2019) . Although in traditional FL systems, the privacy issue could be mitigated through well-studied secure gradient aggregation protocols (Bell et al., 2020) , it becomes more challenging to realize this guarantee in HFL, due to the heterogeneity of the clients' models (refer to Appendix A.2.3). To bridge this gap, a possible solution is to structurally integrate into HFL existing secure querying (a.k.a. private inference) schemes (Rathee et al., 2020; Huang et al., 2022; Wagh et al., 2019; Tan et al., 2021) . These schemes utilize various cryptographic primitives, including homomorphic encryption



As demonstrated in Bonawitz et al. (2017); Bell et al. (2020), clients (e.g., mobile devices) in real-world applications are generally widely distributed and coordinated only by the server.

