FEDERATED LEARNING WITH OPENSET NOISY LA-BELS

Abstract

Federated learning is a learning paradigm that allows the central server to learn from different data sources while keeping the data private at local. Without controlling and monitoring the local data collection process, it is highly likely that the locally available training labels are noisy, just as in a centralized data collection effort. Moreover, different clients may hold samples within different label spaces. The noisy label space is likely to be different from the unobservable clean label space, resulting in openset noisy labels. In this work, we study the challenge of federated learning from clients with openset noisy labels. We observe that many existing solutions, e.g., loss correction, in the noisy label literature cannot achieve their originally claimed effect in local training. A central contribution of this work is to propose an approach that communicates globally randomly selected "contrastive labels" among clients to prevent local models from memorizing the openset noise patterns individually. Randomized label generations are applied during label sharing to facilitate access to the contrastive labels while ensuring differential privacy (DP). Both the DP guarantee and the effectiveness of our approach are theoretically guaranteed. Compared with several baseline methods, our solution shows its efficiency in several public benchmarks and real-world datasets under different noise ratios and noise models.

1. INTRODUCTION

With the development of distributed computation, federated learning (FL) emerges as a powerful learning paradigm for its ability to train with data from multiple clients with strong data privacy protection (McMahan et al., 2017; Kairouz et al., 2021; Yang et al., 2019) . With each of the distributed clients having a different collection and annotation process, their observed data distributions are likely to be highly heterogeneous and noisy. This paper aims to provide solutions for a practical FL setting where not only do each client's training labels carry different noise rates, the observed label space at these clients will differ as well, even though their underlying clean labels are drawn from the same label space. For example, in a global medical system, the causes (labels) of disease are annotated and reported by doctors, and these labels are potentially noisy due to the differences in the doctors' training backgrounds (Ng et al., 2021) . When certain causes and cases can only be found in data clients from country A but not country B, the observed noisy label classes in country A will then differ from the one of country B. We call such a federated learning system has openset noise problems if the observed label space differs across clients. We observe that the above openset label noise will pose significant challenges if we apply the existing learning with noisy labels solutions locally at each client. For instance, a good number of these existing solutions operate with centralized training data and rely on the design of robust loss functions (Natarajan et al., 2013; Patrini et al., 2017; Ghosh et al., 2017; Zhang & Sabuncu, 2018; Feng et al., 2021; Wei & Liu, 2021; Zhu et al., 2021a) . Implementing these approaches often requires assumptions, which are likely to be violated if we directly employ these centralized solutions in a federated learning setting. For example, loss correction is a popular design of robust loss functions (Patrini et al., 2017; Natarajan et al., 2013; Liu & Tao, 2015; Scott, 2015; Jiang et al., 2022) , where the key step is to estimate the label noise transition matrix correctly (Bae et al., 2022; Zhang et al., 2021b; Zhu et al., 2021b; 2022) . Correctly estimating the label noise transition matrix requires observing the full label space, when the ground-truth labels are not available. In FL where the transition matrix is often estimated only with the local openset noisy labels, existing estimators of the noise transition matrix would fail. Moreover, even though we can have the best estimate of the noise transition matrix as if we have the ground-truth labels for the local instances, the missing of some label classes would make the estimate different from the ground-truth one, and again leads to failures (detailed example in Section 3.2). Given the difficulties in estimating the noise transition matrix, we develop a new solution FedPeer to tackle the challenge of learning from openset noisy labels in FL. Our solution is inspired by the idea of using "contrastive labels", whose implementation does not require the knowledge of the noise transition matrix. Notable examples of contrastive labels in the learning with noisy label communities include negative labels (Kim et al., 2019; Wei et al., 2022a) , peer labels (Liu & Guo, 2020), and complementary labels (Ishiguro et al., 2022; Feng et al., 2020) . The high-level idea is to introduce a negative loss using contrastive labels to punish a model for overfitting to the noisy label distributions. Nonetheless, applications of these approaches would require sampling of a global "contrastive" noisy labels -constructing local contrastive labels in each client will be problematic again since different clients may have different noisy label spaces in the openset noise setting. Our solution FedPeer has an explicit step to communicate labels among clients in a differentially private (Dwork, 2008; Dwork et al., 2014) way. Our contributions are summarized as follows. • We formally define the openset noise problem in FL, which is more practical than the existing homogeneous noisy label assumptions. The challenges along with the openset noise are also motivated by analyzing the failure cases of the existing popular noisy learning solutions such as loss correction (Natarajan et al., 2013; Patrini et al., 2017; Liu & Tao, 2015) . • We propose a novel framework, FedPeer, to solve the openset label noise problem. FedPeer builds on the idea of contrastive labels, and adopts peer loss (Liu & Guo, 2020) as a building block. • To mitigate the gap between centralized usage of contrastive labels and the federated one, we propose a label communication algorithm with a differential privacy (DP) guarantee. We also prove that benefiting from label communication, the gradient update of aggregating local peer loss with FedAvg is guaranteed to be the same as the centralized implementation of peer loss, therefore establishing its robustness to label noise. • We empirically compare FedPeer with several baseline methods on both benchmark datasets and practical scenarios, showing that, in terms of FL with openset label noise, directly applying centralized solutions locally cannot work and FedPeer significantly improves the performance.

2. RELATED WORKS

Federated learning is a collaborative training method to make full use of data from every client without sharing the data. FedSGD (Shokri & Shmatikov, 2015) is the way of FL to pass the gradient between the server and the clients. To improve the performance, FedAvg (McMahan et al., 2017) is proposed and the model weight is passed between the server and the clients. In practice, openset problem is common in FL because the source of every client may vary a lot and it is very likely to find that some of the classes are unique in the specific clients. There are a lot of works to analyze and solve the non-IID problem in FL (Zhao et al., 2018; Li et al., 2019; 2021; Zhang et al., 2021a; Li et al., 2020b; Karimireddy et al., 2020; Andreux et al., 2020) . Label noise is common in the real world (Agarwal et al., 2016; Xiao et al., 2015; Zhang et al., 2017; Wei et al., 2022b) . Traditional works on noisy labels usually assume the label noise is classdependent, where the noise transition probability from a clean class to a noisy class only depends on the label class. There are many statistically guaranteed solutions based on this assumption (Natarajan et al., 2013; Menon et al., 2015; Liu & Tao, 2015; Liu & Guo, 2020) . However, this assumption fails to model the situation where different group of data has different noise patterns (Wang et al., 2021) . For example, different clients are likely to have different noisy label spaces, resulting totally different underlying noise transitions. Existing works on federated learning with noisy labels mainly assume the noisy label spaces are identical across different clients (Yang et al., 2022; Xu et al., 2022) . There are other notable centralized solutions relying on the memorization effect of a large model (e.g., deep neural network) (Li et al., 2020a; Liu, 2021; Song et al., 2019; Xia et al., 2021; Liu et al., 2020; Cheng et al., 2020) . However, in a federated learning system, simply relying on the memorization effect would fail, i.e., the model can perfectly memorize all local noisy samples during local training, since the local data is likely to be imbalanced and with a limited amount (Han et al., 2020; Liu, 2021) .

