GROUP-WISE VERIFIABLE DISTRIBUTED COMPUTING FOR MACHINE LEARNING UNDER ADVERSARIAL ATTACKS

Abstract

Distributed computing has been a promising solution in machine learning to accelerate the training procedure on large-scale dataset by utilizing multiple workers in parallel. However, there remain two major issues that still need to be addressed: i) adversarial attacks from malicious workers, and ii) the effect of slow workers known as stragglers. In this paper, we tackle both problems simultaneously by proposing Group-wise Verifiable Coded Computing (GVCC), which leverages coding techniques and group-wise verification to provide robustness to adversarial attacks and resiliency to straggler effects in distributed computing. The key idea of GVCC is to verify a group of computation results from workers at a time, while providing resilience to stragglers through encoding tasks assigned to workers with Group-wise Verifiable Codes. Experimental results show that GVCC outperforms the existing methods in terms of overall processing time and verification time for executing matrix multiplication, which is a key computational component in machine learning and deep learning.

1. INTRODUCTION

Recently, machine learning and big data analysis have achieved huge success in various areas such as computer vision, natural language processing, and reinforcement learning, etc. Since they usually demand a massive amount of computation on a large dataset, there has been increasing interest in distributed systems, where one node is used as a master and the others are used as workers. One possible option is distributed computing (Dalcín et al., 2005; 2011) , where the workers compute partial computation task received from the master. In distributed computing, the master divides and distributes tasks (which require far small memory than the original task) to workers, and they compute the assigned tasks and send results back to a master. Distributed computing can be utilized to compute matrix multiplication in machine learning, the most important and frequent computation block. In a distributed setting, however, there exist two foremost considerations to embed distributed computing in machine learning applications, i) stragglers and ii) adversarial workers. Stragglers are workers that return their computation results much slower than others. It has been reported that stragglers can be a serious bottleneck to performing large-scale computation tasks (Dean & Barroso, 2013; Huang et al., 2017; Tandon et al., 2017) . To handle straggler effects, coded computing was first suggested in Lee et al. (2018) . In coded computing, a master encodes a computation task with a coding technique while retaining redundancy in task allocation. Due to the redundancy arisen from the coding technique, a master does not need all results of tasks to achieve the final output and can ignore stragglers. This approach has been applied to various computation tasks, especially on matrix multiplication (Dutta et al., 2016; Yu et al., 2017; Park et al., 2018; Reisizadeh et al., 2019; Dutta et al., 2019; Yu et al., 2020) . Moreover, some of the workers could be adversarial workers, which send perturbed results to the master to contaminate or degrade the performance of neural networks. Many studies (Biggio et al., 2012; Blanchard et al., 2017; El Mhamdi et al., 2018; Sohn et al., 2020; Bagdasaryan et al., 2020; Wang et al., 2020) demonstrate that adversarial workers slow down the overall training process and 1

