BYZANTINE-ROBUST DECENTRALIZED LEARNING VIA CLIPPEDGOSSIP

Abstract

In this paper, we study the challenging task of Byzantine-robust decentralized training on arbitrary communication graphs. Unlike federated learning where workers communicate through a server, workers in the decentralized environment can only talk to their neighbors, making it harder to reach consensus and benefit from collaborative training. To address these issues, we propose a CLIPPEDGOSSIP algorithm for Byzantine-robust consensus and optimization, which is the first to provably converge to a O(δ max ζ 2 /γ 2 ) neighborhood of the stationary point for non-convex objectives under standard assumptions. Finally, we demonstrate the encouraging empirical performance of CLIPPEDGOSSIP under a large number of attacks.

1. INTRODUCTION

"Divide et impera". Distributed training arises as an important topic due to privacy constraints of decentralized data storage (McMahan et al., 2017; Kairouz et al., 2019) . As the server-worker paradigm suffers from a single point of failure, there is a growing amount of works on training in the absence of server (Lian et al., 2017; Nedic, 2020; Koloskova et al., 2020b) . We are particularly interested in decentralized scenarios where direct communication may be unavailable due to physical constraints. For example, devices in a sensor network can only communicate devices within short physical distances. Failures-from malfunctioning or even malicious participants-are ubiquitous in all kinds of distributed computing. A Byzantine adversarial worker can deviate from the prescribed algorithm and send arbitrary messages and is assumed to have the knowledge of the whole system (Lamport et al., 2019) . It means Byzantine workers not only collude, but also know the data, algorithm, and models of all regular workers. However, they cannot directly modify the states on regular workers, nor compromise messages sent between two connected regular workers. Defending Byzantine attacks in a communication-constrained graph is challenging. As secure broadcast protocols are no longer available (Pease et al., 1980; Dolev & Strong, 1983; Hirt & Raykov, 2014) , regular workers can only utilize information from their own neighbors who have heterogeneous data distribution or are malicious, making it very difficult to reach global consensus. While there are some works attempt to solve this problem (Su & Vaidya, 2016a; Sundaram & Gharesifard, 2018) , their strategies suffer from serious drawbacks: 1) they require regular workers to be very densely connected; 2) they only show asymptotic convergence or no convergence proof; 3) there is no evidence if their algorithms are better than training alone. In this work, we study the Byzantine-robustness decentralized training in a constrained topology and address the aforementioned issues. The main contributions of our paper are summarized as follows: • We identify a novel network robustness criterion, characterized in terms of the spectral gap of the topology (γ) and the number of attackers (δ), for consensus and decentralized training, applying to a much broader spectrum of graphs than (Su & Vaidya, 2016a; Sundaram & Gharesifard, 2018) . • We propose CLIPPEDGOSSIP as the defense strategy and provide, for the first time, precise rates of robust convergence to a O(δ max ζ 2 /γ 2 ) neighborhood of a stationary point for stochastic objectives under standard assumptions. 1 We also empirically demonstrate the advantages of CLIPPEDGOSSIP over previous works. • Along the way, we also obtain the fastest convergence rates for standard non-robust (Byzantine-free) decentralized stochastic non-convex optimization by using local worker momentum.



In a previous version, we referred to CLIPPEDGOSSIP as self-centered clipping.1

