LINEAR SCALARIZATION FOR BYZANTINE-ROBUST LEARNING ON NON-IID DATA

Abstract

In this work we study the problem of Byzantine-robust learning when data among clients is heterogeneous. We focus on poisoning attacks targeting the convergence of SGD. Although this problem has received great attention; the main Byzantine defenses rely on the IID assumption causing them to fail when data distribution is non-IID even with no attack. We propose the use of Linear Scalarization (LS) as an enhancing method to enable current defenses to circumvent Byzantine attacks in the non-IID setting. The LS method is based on the incorporation of a trade-off vector that penalizes the suspected malicious clients. Empirical analysis corroborates that the proposed LS variants are viable in the IID setting. For mild to strong non-IID data splits, LS is either comparable or outperforming current approaches under state-of-the-art Byzantine attack scenarios.

1. INTRODUCTION

Most real-world applications using learning algorithms are moving towards distributed computation either: (i) Due to some applications being inherently distributed, Federated Learning (FL) for instance, (ii) or to speed up computation and benefit from hardware parallelization. We especially resort to distributing Stochastic Gradient Descent (SGD) to alleviate the heavy computation underlying gradient updates during the training phase. Especially with the high dimensionality of large-scale deep learning models and the exponential growth in user-generated data. However, distributing computation comes at the cost of introducing challenges related to consensus and fault tolerance. In other words, the nodes composing the distributed system need to reach consensus regarding the gradient update. In the honest setting this can be done simply by a parameter server that takes in charge the aggregation of computation from the workers. However, machines are prone to hardware failure (crush/stop) or arbitrary behavior due to bugs or malicious users. The latter is more concerning as machines may collude and lead to convergence to ineffective models. Since deep learning pipelines are involved in decision making at a critical level (e.g., computer-aided diagnosis, airport security. . . ); it is crucial to ensure their robustness. We study robustness in the sense of granting resilience against malicious adversaries. More precisely, poisoning attacks that target the convergence of SGD. The adversarial model follows the Byzantine abstraction Lamport et al. (1982) . The basis of distributed Byzantine attacks is the disruption of SGD's convergence by tampering with the direction of the descent or magnitude of the updates. The robustness problem is highly examined and there exists a plethora of aggregations Blanchard et al. ( 2017 Nonetheless, recent works Karimireddy et al. (2022; 2021) highlight the inability of these algorithms to learn on non-IID data. Indeed, defending against the Byzantine in a heterogeneous setting is not trivial. Naturally, aggregations rely on the similarity between honest workers to defend against the Byzantine. However, as data becomes unbalanced distinguishing malicious workers from honest ones becomes increasingly challenging: an honest worker may slightly deviate from its peers due to skewed data distribution. Another weakness of current work is that most aggregation discard a subset of information either by dropping full gradient vectors or by eliminating a set of coordinates along each dimension of the submitted vectors. Nevertheless, dropping users' updates leads to (i) Degradation of the model's final accuracy, especially when no Byzantine attackers are present. (ii) It may also discard minorities with vastly diverging views as elaborated in Mhamdi et al. (2021) .



); Alistarh et al. (2018); Yin et al. (2018); Damaskinos et al. (2018); Boussetta et al. (2021); El Mhamdi et al. (2018).

