GROUP-WISE VERIFIABLE DISTRIBUTED COMPUTING FOR MACHINE LEARNING UNDER ADVERSARIAL ATTACKS

Abstract

Distributed computing has been a promising solution in machine learning to accelerate the training procedure on large-scale dataset by utilizing multiple workers in parallel. However, there remain two major issues that still need to be addressed: i) adversarial attacks from malicious workers, and ii) the effect of slow workers known as stragglers. In this paper, we tackle both problems simultaneously by proposing Group-wise Verifiable Coded Computing (GVCC), which leverages coding techniques and group-wise verification to provide robustness to adversarial attacks and resiliency to straggler effects in distributed computing. The key idea of GVCC is to verify a group of computation results from workers at a time, while providing resilience to stragglers through encoding tasks assigned to workers with Group-wise Verifiable Codes. Experimental results show that GVCC outperforms the existing methods in terms of overall processing time and verification time for executing matrix multiplication, which is a key computational component in machine learning and deep learning.

1. INTRODUCTION

Recently, machine learning and big data analysis have achieved huge success in various areas such as computer vision, natural language processing, and reinforcement learning, etc. Since they usually demand a massive amount of computation on a large dataset, there has been increasing interest in distributed systems, where one node is used as a master and the others are used as workers. One possible option is distributed computing (Dalcín et al., 2005; 2011) , where the workers compute partial computation task received from the master. In distributed computing, the master divides and distributes tasks (which require far small memory than the original task) to workers, and they compute the assigned tasks and send results back to a master. Distributed computing can be utilized to compute matrix multiplication in machine learning, the most important and frequent computation block. In a distributed setting, however, there exist two foremost considerations to embed distributed computing in machine learning applications, i) stragglers and ii) adversarial workers. Stragglers are workers that return their computation results much slower than others. It has been reported that stragglers can be a serious bottleneck to performing large-scale computation tasks (Dean & Barroso, 2013; Huang et al., 2017; Tandon et al., 2017) . To handle straggler effects, coded computing was first suggested in Lee et al. (2018) . In coded computing, a master encodes a computation task with a coding technique while retaining redundancy in task allocation. Due to the redundancy arisen from the coding technique, a master does not need all results of tasks to achieve the final output and can ignore stragglers. This approach has been applied to various computation tasks, especially on matrix multiplication (Dutta et al., 2016; Yu et al., 2017; Park et al., 2018; Reisizadeh et al., 2019; Dutta et al., 2019; Yu et al., 2020) . Moreover, some of the workers could be adversarial workers, which send perturbed results to the master to contaminate or degrade the performance of neural networks. Many studies (Biggio et al., 2012; Blanchard et al., 2017; El Mhamdi et al., 2018; Sohn et al., 2020; Bagdasaryan et al., 2020; Wang et al., 2020) demonstrate that adversarial workers slow down the overall training process and To be robust against adversarial attacks, a master can check whether the computation results from workers are correct or not to figure out adversarial workers. Based on this idea, there have been some recent studies that suggest an encoding scheme for assigning computation tasks to workers to tolerate wrong computation results when obtaining the final computation product, and to identify adversarial workers from the returned computation results (Yu et al., 2019; Soleymani et al., 2021; Hong et al., 2022) . Another line of research, which we call verifiable computing, has focused on identifying adversarial workers with a fast verification procedure. In verifiable computing, a master verifies the correctness of computation results by using a verification key (Freivalds, 1977; Tang et al., 2022) . In this paper, we focus on designing a fast and robust distributed computing system for matrix multiplication. We use both a coding approach and verifiable computing to combat adversarial attacks from malicious workers and straggler effects. Our system model is depicted in Fig 1(b) , where a master encodes the computation tasks, distributes them to the workers, and decodes the final product after receiving and verifying the computation results from the workers. To be specific, we suggest a novel encoding scheme that enables verifying a large number of computation results at a time, and decoding the final computation product from a part of computation results from the workers to mitigate straggler effects as well. We experiment with various settings and demonstrate that our proposed scheme can identify adversarial workers and obtain the final computation product much faster than existing methods. The main contributions of this paper are as follows. • We propose Group-wise Verifiable Coded Computing (GVCC), handling both straggler effects and adversarial attacks in distributed computing for matrix multiplication tasks. To be more specific, we suggest group-wise verifiable codes (GVC) for encoding the computation tasks of workers and provide a suitable group-wise verification algorithm.



We provide more explanation and detailed experiment settings in Appendix A.



(a) Training curve of neural networks under adversarial attacks. (b) Distributed computing for a matrix multiplication C = AB under adversarial workers and stragglers.

Figure 1: Training curve of neural networks under adversarial workers and system model for distributed computing

