

Abstract

Emerging edge intelligence applications require the server to continuously retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects only a subset of weights to update at each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.

1. INTRODUCTION

To deploy deep neural networks (DNNs) on resource-constrained edge devices, extensive research has been done to compress a well-trained model via pruning (Han et al., 2016; Renda et al., 2020) and quantization (Courbariaux et al., 2015; Rastegari et al., 2016) . During on-device inference, compressed networks may achieve a good balance between model performance (e.g., prediction accuracy) and resource demand (e.g., memory, computation, energy). However, due to the lack of relevant training data or an unknown sensing environment, pre-trained DNN models may not yield satisfactory performance. Retraining the model leveraging newly collected data (from edge devices or from other sources) is needed for desirable performance. Example application scenarios of relevance include vision robotic sensing in an unknown environment (e.g., Mars) (Meng et al., 2017) , local translators on mobile phones (Bhandare et al., 2019) , and acoustic sensor networks deployed in Alpine environments (Meyer et al., 2019) . It is mostly impossible to perform on-device retraining on edge devices due to their resourceconstrained nature. Instead, retraining often occurs on a remote server with sufficient resources. One possible strategy to continuously improve the model performance on edge devices is a two-stage iterative process: (i) at each round, edge devices collect new data samples and send them to the server, and (ii) the server retrains the network using all collected data, and then sends the updates to each edge device (Brown & Sreenan, 2006 ). An essential challenge herein is that the transmissions in the second stage are highly constrained by the limited communication resource (e.g., bandwidth, energy) in comparison to the first stage. State-of-the-art DNN models always require tens or even hundreds of mega-Bytes (MB) to store parameters, whereas a single batch of data samples (a number of samples that can lead to reasonable updates in batch training) needs a relatively smaller amount of data. For example, for CIFAR10 dataset (Krizhevsky et al., 2009) , the weights of a popular VGGNet require 56.09MB storage, while one batch of 128 samples only uses around 0.40MB (Simonyan & Zisserman, 2015; Rastegari et al., 2016) . As an alternative, the server sends a full update once or rarely. But in this case, every node will suffer from a low performance until such an update occurs. Besides, edge devices could decide on and send only critical samples by using active learning schemes (Ash et al., 2020) . The server may also receive training data from other sources, e.g., through data augmentation or new data collection campaigns. These considerations indicate that the updated weights which are sent to edge devices by the server at the second stage become a major bottleneck. To resolve the above challenges pertaining to updating the network, we propose to partially update the network through changing only a small subset of the weights at each round. Doing so can significantly

